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1. Preface 


EBNF+ is a novel way to construct: 
a parser 

You specify your required grammar in EBNF notation. The EBNF+ tool maps this in a simple program 
which parses on any string such as user input, program code and binary codes. 

a compiler generator 

Specify in your grammar what to do during parse. You may interpret or compile the input data. As a 
brain tickling example the tool generates itself from its specification. 

a syntax-semantic interface 

Evaluate the parsed input and signal the parser sense or nonsense of this input. Properly applied, form 
then signals function and function signals form. The paper demonstrates this duality in solving some 
common programming tasks. 

state machines 

What if a string is the production of multiple grammars acting at the same time (e.g. multiple protocols 
over one line)? Analysis of such strings can be handled in EBNF+ with minimal effort using state 
machines. 

For demonstration the method is applied for the EBNF grammar notation expanded with semantics expressed in 
VBA (Visual Basic for Applications). A working tool, named EBNF+, is available as Excel workbook. Although 
miserable in size it handles all examples in this paper, is a fun platform for grammar experiments but also for 
real parser projects. Anyone with Excel can therefor directly join this syntax-semantic adventure! Apart from a 
handy tool it is above all a new way of thinking about data interpreting and compiling that handles far beyond 
the currently applied approaches. 

The EBNF+ method can well be used for interpreters, compilers, transactional communication, filtering, data 
conversions, everywhere where input data matters. The method is brand new so larger examples simply are not 
available. It would not fit anyway in this paper and above all application is not its purpose of this paper but the 
presentation of the method. 

Criticism and questions welcome. 
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3. Definitions 


Below definition of terms hold in the context of EBNF+. 


CFG 

Context Free Grammar [31, a grammar where each non-terminal is a sequence of 
terminals and non-terminals 

character 

a printable symbol or a printer control code presented by a value in the range 0 to 

255. By this definition a character and a byte are synonym. EBNF+ indeed does not see 
the difference. This makes syntax checking of binary files the same as of text files. 

EBNF 

Extended Backus Naur Form [1], a notation to describe a CFG 

EBNF+ 

EBNF extended with semantic expressions to link form with meaning 

form 

the sequence of characters 

grammar 

The specification of all allowed forms, i.e. a bunch of syntax rules that derive from one 
non-terminal, the start symbol 

host language 

the programming language to which the program code of the EBNF+ expressions is 
compiled. In this paper VBA [2] is the host language. 

input string 

the string that is parsed, typically text or binary file 

meaning 

the effect of executing the semantic expression 

non-terminal 

unique name that can be replaced by a sequence of terminals and non-terminals 

parse 

to check if an input string is within grammar 

parse track 

the route through the input string of compared characters during a parse 

recursion 

the case where a non-terminal is defined by itself (e.g. number = digit, number;) 

rule 

the definition of a non-terminal or state machine in EBNF+ 

rule track 

the route through the rules checked during a parse 

semantic expression 

a unit of program code embedded in the syntax that executes at parse 

semantics 

the meaning, in EBNF+ expressed as program code in the host language 

start symbol 

a non-terminal as starting rule for a parse 

state machine 

object that will change state when parse on input is successful. State machines may 
work semi parallel. 

state machine rule 

Rule where terms are states and symbols become transition conditions. Successful 
parse leads to state transition. Unsuccessful parse triggers a return to the caller of the 
rule. The state machine will resume parse when again called. 

string 

any sequence of zero or more characters 

syntax 

form as specified in EBNF+ 

syntax rule 

definition of a non-terminal as combinations of terminals and non-terminals 

term 

terminal or non-terminal 

terminal 

a string defined in the syntax rule 

RE 

Regular Expression [4], a concise notation to match and process string patterns 

Regular grammar 

a grammar where each non-terminal is a character followed by a non -terminal 
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4. The notation 


EBNF stands for Extended Backus-Naur form. It is a formal notation to specify grammars, e.g. of program 
languages, file formats and communication protocols. EBNF+ adds some symbols to the classic EBNF form by 
which actions can be specified that relates to the syntax. EBNF+ is also the name of the parser generator tool 
that uses this specification method. 

From the many EBNF notations in use, I followed IS014977 [1] and extended it with few constructs for EBNF+. 
Below the applied notation. If you don't like it, there are many others and best of all, since the tool can define 
itself, you may alter and extend its syntax and semantics the way you like. 


Notation J 

F or syntax expressions 

symbol 

used for 

Example 

= 

syntax rule 

Answer = Yes | No; 

{ } 

repetition zero or more 

String = {Character}; 

{ }* 

repetition one or more 

Integer = {Digit}*; 

t ] 

option 

Yes = 'Y' , [ ' es '] | ' yes '; 

< > 

grouping 

Yes = ( ' Y ' ' y' ) , ['es']; 

i 

alternative 

Vowel = 'aTeTi’I’u'; 

- 

exception 

Consonant = Character - Vowel; 

+ 

in concordance 

Digit = Character + Number; 

t 

concatenation 

Integer = Digit, {Digit}; 

1 ! 

terminal 

DoubleQuote = '"' ; 

II II 

terminal 

LastQuote = "'Help!'"; 

* 

factoring 

TrippleA = 3*"A"; 

- 

terminal as single byte 

CarriageReturn = A 13; 

r 

termination 

EmptyStatement =; 


ranges 

Letter = 'A'..'Z' | 'a'..'z'; 

Digit = ' 0 ' . . ' 9 ' ; 

Character = A 0.. A 255; 

# 

state machine rule 

Payment # msgl200, msgl210; 

( * * ) 

comment 

(* This is a comment of 36 characters *) 


Notation J 

tor semantic expressions 

symbol 

used for 

Example 

« » 

unconditional code 

<<MsgBox 'Parser was here! '>> 

<. . > 

conditional code 

c.MsgBox sLastln & ' was last match!'.> 
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5. The method 


Traditional parsing techniques involve dedicated and elaborate programs that pre-process your input, mix 
syntax check and semantic action into a moloch of code, tables, trees, lists, and exception handlers, where only 
experts manage. 

Approach of EBNF+ is different. Since EBNF specifications are simple to understand it should sure be simple to 
automate. Ok, the practical problem may be the "should" part, but let's prove this. 

This chapter shows how the EBNF+ method directly translates EBNF grammar expressions into program 
statements and with some kernel functions forms a parser virtual machine that is both small and effective. As 
logical enhancement it further discusses how you can add into the syntax the semantic actions that follow this 
syntax and vice versa. 

There are two main ideas in EBNF+: 

a. the lean and mean mapping of EBNF grammar to program code 

Mapping is performed with some 10 tiny functions, each some 5 lines of program code. The traditional 
method of lexical analysis prior to syntax analysis is proven unnecessary including all its implications. 

b. the ability to check during the parse the meaning of the parsed input to control the parse itself 

Semantic expressions in the syntax can be used to evaluate intermediate results and can signal that the 
meaning of input makes sense or not, hence signalling valid or invalid input either from semantic 
viewpoint. 

In EBNF+ the user specifies in one text file both syntax and corresponding semantics. Syntax is in EBNF; 
semantics is in the host language. The EBNF+ tool will generate this combined specification directly into 
program code of the host system. Execute this code with the EBNF+ kernel, and you will end up with a parser 
(and much more) of your desires. Of course don't belief this until you see it and you actually can since a EBNF+ 
implementation is available that supports all as discussed in this paper. 

For the examples in this paper our host is Excel VBA (sorry I am not a professional programmer). Porting the 
EBNF+ method to other language systems must be no issue. I challenge everyone to publish his/her solution! 

The supplied (or just email me) tool EBNF+ in Excel (EBNF+*.xlsm) has one macro button to translate your EBNF+ 
source file (*.bnf) into a VBA module (*.bas). Examples in EBNF+ are further available and are amongst some 
used in this paper. 

There is one file paramount, the grammar file, EBNF+*.bnf. You can load and compile it. The result is the syntax- 
semantic module of EBNF+ that you just used to compile! You may wonder what was first, source or compiler. 

As many creations it is the product of evolution. And before that there was nothing. OK maybe some wondering 
dude with itchy hands. Please be free to use the EBNF+ idea to extend and improve it to your world. 

Now I probably have to explain how it works. So where to start? Let's jump middle into some practical grammar 
examples and see what EBNF+ bakes from it. 

In each example in this paper a parsing related problem is tackled. Most solutions come to you in 3 steps: 

Step 1 gives the EBNF+ specification for the solution, next time your specification 

Step 2 shows the by EBNF+ generated program that forms a specific parser as per EBNF+ specification 

Step 3 indicates how you can call the generated parser to input data. 

Three principle EBNF+ application forms are handled: 

a. Simple syntax check. Result is True or False depending on the syntax correctness. E.g. check if email 
address has correct format (example 1). 

b. Semantics follows syntax. Result of input is a specified action, such as the assignment of variables 
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(example 2 and 3). In this category also falls the file generation based on another file, such as for 
compilation. Best example is the generation of the EBNF+ tool by the tool itself in chapter 6. 

c. Meaning of the input data is evaluated as part of the rule check. Result is True or False depending both 
on syntax and meaning of the input. E.g. if the time of day or valid IP addresses are meaningful 
(example 4 and 5) 

The examples are minimal, but highlight the specific EBNF+ features. 


Translate EBNF to common control statements 

The simplest form of EBNF+ use is to check if syntax of some input data is according to some grammar. 


Example 1: Check if an email address is valid 

A common interface problem is to check the plausibility of a correct email address as provided by end users 
such as through web sites and fill in sheets. 

Note that you can enhance/play with the check by only improving the syntax rule in Step 1. The further 2 steps 
are handled by EBNF+. The requirement (whot has to be done) is the only thing that matters to and is specified 
by the developer. He should have no priority on how it is solved). Of course some programmers will mock at the 
generated code. They are probably right. With EBNF+ everything is open and simple. So these birds can start to 
improve this tool. _ 

Step 1: Create the rule (user input) 

Use a text editor to create your rules and file it under a nice name with extension bnf (e.g. email.bnf) 

The example rule below, describes a grammar for a proper email address. Read it as follows: 

- An email character (eChar) can be a digit 0 to 9, a character a to z, or A to z, a dot, dash or underline. 

- An email word (eWord) consists of one or more email characters but not starting or ending with a dot. (Can also 
be specified as: eWord= {eChar}+ -'.') 

- An email address (eMailAddress) is the sequence of an email word, the 0-character and another email word. 

This one I did, but specifying the grammar should normally be your creative task. To make you as lazy as I, it must 
be said though, that grammars in EBNF are readily available for common expressions and languages. And sure, 
scrutinizing on proper email addresses format can be improved. 

A warning on elegancy. The EBNF rules are quite intuitive and easy to understand, but creating them is another 
thing. The latter needs thinking, testing, correcting and rethinking why it is still looks awful, but in the end it just is 
perfect, simple and satisfying. 

eChar = , a , .. , z , | , A , .. , Z , | , 0 , .. , 9' |'.'|'-'|'; 

eWord = eChar - '. ' , { eChar} - ' . ' ; 

eMailAddress = eWord, ' eWord; 


Step 2: Compile the rule (EBNF+ output) 

Compile the rule using the button in the EBNF+ tool: 


Compile File (bnf >bas) 


Browse to the bnf- file, select it and click OK. The tool then generates within a second a file with same name, but 
with extension bas. 

The EBNF+ tool translates the rule into executable VBA (shown below) which can be loaded as module or added to 
your existing project. As service you will find your EBNF source as comment below the function. 
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The tool considerably shortens development effort. In the example, the 3-line specification traditionally is coded by 
an enthusiastic programmer into either many lines or using a casted regular expression embedded in awkward 
loops, often hard to understand, change and maintain. 

With EBNF+ the programming part you just skip and let the tool do it. Nevertheless, understanding the host code 
helps the casual user to understand the functionality of the parser. For the parsing formalist and engineer, it is a 
nice subject to think further about. 

Now, if you not already did, compare the green with the generated code. The compiling process is strait forward. 
There is a one to one relationship with your source and the object code: 

- Each definition of a non-terminal becomes a function. Name of the non-terminal becomes the name of a 
function. Rule itself is the function-body. 

- Each non-terminal in the rule is compiled to a call to a function with the name of the non-terminal. 

- Terminal symbols are mostly parsed by the function pin (parse Input data). This function is the heart of the 
method and counts only 8 lines of code. It checks if its argument matches the string in the input. When matched, 
the public Parser Boolean pOk is set to True or False. 

- The functions cPush, cTop and cDrop, keep track and control the parser status (5 variables). They allow the 
parser to backtrack in case pin delivers a False pOk (Boolean that represents the result of the parse), or in parsing 
terminology, in case the parser hits a blind alley. 

EBNF+ uses only a few kernel functions and variables. Together with the compiled code left, they form a small 
virtual machine that constitutes the parser. This leanness makes understanding the method as easy as kissing (with 
some nice experimenting). Flere we specified the machine in VBA, but you can port it to any host language no 
doubt. Since the kernel is in total only a few pages of code, porting to any lower or higher level language should be 
done in a hop. To keep up suspense the description of the kernel is in the next chapter. 

Function eChar() 
cPush 

Call plnterval("a", "z") 

If Not pOk Then 

Call plnterval("A", "Z") 

If Not pOk Then 

Call plnterval("0", "9") 

If Not pOk Then 
pin() 

If Not pOk Then 
pin () 

If Not pOk Then 
pin (".") 

End If 
End If 
End If 
End If 
End If 
cDrop 

End Function 

' eChar = ' a' . . ' z ' | ' A’ . . ' Z ' | ' 0 ' . . ' 9 ' | if _* | 1 - * ; 

Function eWord() 
cPush 
eChar 

If pOk Then 
Do 

eChar 

Loop While pOk AND Not blnEnd 
pOk = True 
If pOk Then 
pNotln(".") 

End If 
End If 
cDrop 

End Function 

' eWord = eChar, {eChar} - ' ; 
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Function eMailAddress() 
cPush 
eWord 

If pOk Then 
pin ("@") 

End If 
If pOk Then 
eWord 
End If 
cDrop 

End Function 

eMailAddress = eWord, ' eWord; 


Step 3: Apply the grammar (by message to user) 

Below a demo of the email rule in action. As for all EBNF+ parsings, the input data is assigned to the EBNF+ public 
string variable sin. Furthermore, three parser parameters are initialised: 

ilnPnt, points to the first character in sin where to start the parse. During the parse it will advance, at least 
if the syntax of the input string is correct. At the end of a successful parse, ilnPnt stares after the last 
character of sin. 

ilnPntMax, points to the character in sin where the parse process went deepest into sin to check if it 
obeyed the syntax rule. This parameter is for error messages. It shows at what point it probably went 
wrong. 

iContext, indicates the current level of the parser. If a match between a terminal and the substring in the 
input fails, then the parser drops back to the level on which the syntax rule so far held. Each level holds a 
copy of all parser parameters. 

Essential is the function eMailAddress that has been compiled from the non-terminal eMailAddress rule in 
the grammar. Running this function will apply the grammar (for the non-terminal eMailAddress) onto the input 
sin. In grammar world this non terminal is known as the Start symbol. Funky, heh? 

Result of the execution is the public Boolean pOK, which tells you if the input was in line with the grammar. 

What you do with the pOk is to your fantasy again. You define what to do when the input is grammatically correct 
or incorrect. 

' Apply the grammar somewhere in your code, e.g.: 

sin = M heny@telenet.be" 
ilnPnt = 1 
ilnPntMax = 1 
iContext = 0 

eMailAddress 
IF pOK THEN 

MsgBox sin & " Looks a correct email address" 

ELSE 

MsgBox sin & " Is an ill formed email address" 

END IF 


Step 3: Apply the grammar (using Excel sheet) _ 

Since we are using Excel, let's make use of its interface. Left another example of using the email rule, now applied 
in the Excel sheet for your demonstration. 

Here function ParseString is used, an Excel aimed EBNF+ function that you can use in your workbook or macro 
for experimentation. 

As you can see in the screen snippet, the ParseString function has two arguments. The first argument 
(h.2@be) forms the input data to be parsed from cell A29 while the second argument takes the string 
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representation of the name of the grammar definition from cell B$26 ("eMailAddress"). 

For the VBA geeze: 

Function ParseString(slnput As String, sGrammar As String) As Boolean 
sin = slnput 
ilnPnt = 1 
ilnPntMax = 1 
iContext = 0 
sOut = "" 

Application.Run sGrammar 

ParseString = pOk And blnEnd 
End Function 

This code betrays a lot about the EBNF+ internal mechanism. Notice the 4 parser parameters already discussed. 
They get initialised here for you for free. There is a 5 th , sOut which holds the string that is build up when the tool is 
used for delivering an output such as for compilation, file generation or intermediate data results. In this example 
we have nothing to deliver as string, so it is unused. 

Further note that the typed function ParseString is assigned True only when the grammar is applied correctly to the 
complete input string input. In the underlying parser machine this is fulfilled by the check of the intermediate parse 
result pOk (if True parse so far is successful) and blnEnd (if True parser has checked every character in input sin if it 
is conform the Start symbol production). 

' Apply the grammar in a cell formula, e.g.: 


B29 




=parsestring(A29. BS26) 



A 

b c 

26 


emailaddress 

27 

test@test.be 

TRUE 

28 

test @test 

FALSE 

29 

h.2@be 

TRUE 

30 

h.@test 

FALSE 

31 

h@test 

TRUE 

32 

h@test 

FALSE 

33 

han_test@tst@tes 

FALSE 

34 

han.test@tsLtest 

TRUE 

35 

han_test@_tst.est 

FALSE 

36 

han_test@@test 

FALSE 

37 

@test.tst 

FALSE 


Semantics follows syntax 

Now that you can check syntax, time to link some action to it. As said, syntax is specified in EBNF, while semantics is 
specified in the host code. To help EBNF+ locate your code you enclose it between '«' and '»' symbols, e.g.: 

eChar = ( ' a ' . . ' z ' | ' A' . . ' Z ' ) «MsgBox "Input data is checked if it is a character. "»; 

Although arbitrary to the parser, my convention is to specify the syntax left side in the source text and the action code 
in the right, just to distinguish syntax from semantics to the reader (and writer). 

Above MsgBox (a standard VBA function to display its argument) will always execute here, no matter if the parsed 
input contains a matched character or not. 

If you want code to be executed only when the parsed input meets the syntax spec so far checked, you can enclose 
the code between '</ and symbols, e.g.: _ 

eChar = ('a , .. , z , | , A , .. , Z') <.MsgBox "Input data is a character \".> ; 


A more flexible way of specifying semantics is the use of functions that will perform the required actions. Equivalent 
functionality to above is the following code: _ 

«Function SmartRemark () » 

« MsgBox "Input data is a character \"» 
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«End Function» 

eChar = ( 'a'. 

.'z'|'A'. 

. ' Z ' ) 

, SmartRemark ; 


The SmartRemark in the syntax rule eChar is specified here as a non-terminal of the syntax, but in this case 
SmartRemark is a standard VBA function instead of yet another syntax rule. The comma that precedes SmartRemark 
marks concatenation in the syntax. You may read the line as: An eChar is a letter concatenated by a SmartRemark. 
Hence, quite natural way of specifying in the syntax some semantics, I would say. 

Note that the comma (yellow) here is a syntax symbol but used here to channel the execution of program code! 

Now please analyse the following snippet: 

«Function PositiveRemark () » 

« MsgBox "Parsed data is indeed a character ["» 

«End Function» 

«Function NegativeRemark () >> 

« MsgBox "Parsed data is NOT a character!"» 

«End Function» 

eChar = ('a'..'z'|'A'..'Z') , PositiveRemark | NegativeRemark ; 


Again, we use the syntax Or-symbol (2nd character T) to channel semantics again. It will act when the input sofar is 
not the required character. 

Some detailed notes that are logical but maybe surprising at first thought. 

Firstly, when the thread of execution ever reaches NegativeRemark then the parser pointer (named pin, pointer that 
points to the next string to be parsed) is pointing to the data that apparently is not a character in the range specified. 
This is caused by the backtracking mechanism implemented in EBNF+. 

This mechanism is not absolutely necessary for parsers, but it allows a wider range of grammars. The world behind 
this is discussed in [3]. 

Secondly, the option part (part right to the 2nd Or-symbol) does not further specify syntax: the eChar rule is always 
fulfilled and will result in pOk=True. 


Example 2: Extract drive, path and file name from fully qualified path file name _ 

Step la: Create the rule 

Here we create a rule for parsing full path file names. Wow, more rules then I expected. Not very regular then©. 

Top level non-terminal is PathFileName, the Start symbol. The syntax rule states that it consists of an optional drive 
and path name followed by a non-optional filename. Now our semantic assignment is: extract these names from a 
random path file name and assign them to some nice variables. 

PFChar = 'a'..'z'|'A'..'Z'rO'..'9' II f -'I'-' ; 

PFWord = PFChar , { PFChar}; 

DriveName = ('a'..'z'|'A'..'Z') , ' : ' ; 

PathName = [PFWord-'.'] , 'V , {PFWord '\'}; 

FileName = PFWord - '.' ; 

PathFileName = [DriveName],[PathName],FileName; 


Step lb: Add semantics to the rule 

«Dim sPath As String >> 
«Dim sDrive As String>> 
<<Dim sFile As String >> 
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PFChar = 'a'..'z'l'A'..'Z'l'O 1 ..'9' \; 

PFWord = PFChar , { PFChar } ; 

DriveName = ('a'..'z'|'A'..'Z') , ' : ' ; 

PathName = [ PFWord-'.' ] , '\' , { PFWord -'.', '\' } ; 

FileName = PFWord - ' . ' ; 

PathFileName = [DriveName 

],[PathName 

],FileName 

MsgBoxfDrive^ 1 & sDrive & 1 Path=' & sPath & 1 File=' & sFile); 

Six statements added will do the job. 

For the demo, a message will show the result. 

In EBNF+ only one variable is used to pass variables. Nice and simple. The variable is sLastln. This is a handy EBNF+ 
run time string variable that contains the string of the last completed non-terminal rule or parsed terminal symbol. It 
will be empty when the rule did not hold. 

Note that the message is again in the form of a dummy terminal. Again the comma in front tells the parser that the 
MsgBox procedure is only called when so far the parse is successful. Again, a syntax symbol ("I") channels the 
semantic behaviour. 


«sDrive = sLastIn» 
«sPath = sLastIn>> 
«sFile = sLast!n>> 


Step 2 ; Compile the rule and actions 


EBNF+ compiles the rules into VBA (left). 

As one can see, generated code easy gets long, but you may just thank the compiler that it did the job. 

Btw, the highlight is not generated by the compiler. It is only to show you where your semantic specification got 
embedded. 


' Compiled by: EBNF+arser 0V35 
' Date: 23-12-2013 00:24:31 
Option Explicit 

Dim sPath As String 
Dim sDrive As String 
Dim sFile As String 

Function PFChar() 
cPush 

Call plnterval("a", "z") 

If Not pOk Then 

Call plnterval("A", "Z") 

If Not pOk Then 

Call plnterval("0", "9") 

If Not pOk Then 
pin 

If Not pOk Then 
pin ("-") 

If Not pOk Then 
pin (".") 

End If 
End If 
End If 
End If 
End If 
cDrop 

End Function 

' PFChar = 'a'..'z'l'A'..'Z'l'O'..'9' 

Function PFWord() 
cPush 
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PFChar 
If pOk Then 
Do 

cPush 

PFChar 

cDrop 

Loop While pOk And Not blnEnd 
pOk = True 
End If 
cDrop 

End Function 

' PFWord = PFChar , { PFChar }; 

Function DriveName() 
cPush 

Call plnterval("a", "z") 

If Not pOk Then 

Call plnterval("A", "Z") 

End If 
If pOk Then 
pin 
End If 
cDrop 

End Function 

' DriveName = ('a'..'z'|'A'..'Z') , 1 :' ; 

Function PathName() 
cPush 

PFWord 
If pOk Then 

ilnPnt = ilnPnt - Len(sLastTerm) 

cPush 

pin 

pOk = Not (pOk) 
cDrop 
End If 
pOk = True 
If pOk Then 
pin ("\") 

End If 
If pOk Then 
Do 

cPush 

PFWord 

If pOk Then 

ilnPnt = ilnPnt - Len (sLastTerm) 

cPush 

pin 

pOk = Not (pOk) 
cDrop 
End If 
If pOk Then 
pin ("\") 

End If 
cDrop 

Loop While pOk And Not blnEnd 
pOk = True 
End If 
cDrop 

End Function 

' PathName = [ PFWord-'.' ] , '\' , { PFWord -'.', '\' } 

Function FileNameO 
cPush 
PFWord 
If pOk Then 

ilnPnt = ilnPnt - Len(sLastTerm) 
cPush 
pin (".") 
pOk = Not (pOk) 
cDrop 
End If 
cDrop 

End Function 
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' FileName = PFWord - '.' ; 


Function PathFileName() 


cPush 


DriveName 


sDrive = sLastTerm 


pOk = True 


If pOk Then 


PathName 


sPath = sLastTerm 


pOk = True 


End If 


If pOk Then 


FileName 


sFile = sLastTerm 


End If 


If pOk Then 


MsgBox ("Drive=" & sDrive & " Path= M & sPath & " 

File= M & sFile) 

End If 


cDrop 


End Function 


' PathFileName =[DriveName -sDrive 

= sLastTerm- 

' ], [PathName -sPath 

= sLastTerm- 

' ],FileName -sFile 

= sLastTerm- 

' MsgBox('Drive=' & sDrive & 1 Path=' & sPath & ' File=' & sFile) ; 



Step 3: Apply the grammar 



_-] : [_ 

f* =parsestring(A42, BS39) 


A 

B < 


39 

PathFileName 

40 

C:test 

TRUE 


41 

C:\test 

TRUE 


42 

C:t1 \t2.exe 

|=parsestring(A42,! 


43 

c:\t1\t3\t4.txt 

TRUE 


44 


Mirrncnft FytpI 


45 


IVIIV.I UjUI l LAV.UI 


46 




47 


Drive=C: Path=t1\ File=t2.exe 


48 




49 




50 


j OK || 


51 




5? 


FAI SF 


For a sheet demonstration 1 again used the ParseString function in the cells. When Excel calculates each formula in 
column D, the message box pops up with the drive, path and file name found. (It will nag you too when doing the 
examples) 
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Example 3: Assign arithmetic value to a parsed object 


In this example to calculate molecular weight from a chemical formula, we will immediately write syntax and 


semantics in one go and further skip the object code. You should by now be EBNF+ expert. 


Step 1: Create the rule and semantics 


' EBNF+ example that parses a chemical formula and calculate its molecular weight. 

«Dim AW As Single>> 

«Dim EW As Single>> 

«Dim MW As Single>> 

Digit = '2' . . '9'; 

Count = Digit , {Digit} ; 


Element = 'O' 

| 'H' 

| 'Na' 
| 'Cl' 

| 'C' 


«AW=15.9994» 
«AW=1.00794» 
«AW=22.9897» 
«AW=35.4527» 
«AW=12.0107»; 


ElementCount = Element , [Count] 


«IF sLastIn= M " THEN 
« EW=AW 

«E1SE EW=Int (sLastln) *AW 
«ENDIF 


» 

» 

» 

»; 


Formula = ElementCount 

, {ElementCount 

}; 


«MW=EW» 

«MW=MW+EW» 


Function MolecularWeight (sFormula As String) As Single 
If ParseString(sFormula, "Formula") Then 
MolecularWeight = MW 
Endlf 

End Function 


Declare 3 help variables AW, EW and MW that will hold atomic-, element- and molecular-weight. 

Declare the non-terminal that gives the element options and assign the correct atomic weight to the help variable 
AW. 

In chemical formulas the count is optional and defaults to one if unspecified. Now how do we know when Count is 
skipped? From parse perspective an optional element will also be attempted to be parsed. If the attempt fails, sLastln 
(string variable that holds the last [non-]terminal parsed) is empty. We can use this property in general to properly 
process syntax options that fall back to default values in the semantic part. 

For the molecular weight calculation we define a regular Excel function (MolecularWeight). 


Step 2: Compile and apply the grammar the rule and actions 

Compile the example and load the module . I skipped mentioning the generated code. You can go through the 
resulting *.bas file if you are really curious. _ 


Step 3: Compile and apply the grammar the rule and actions 


C65 i 

| X ✓ U 

=MolecularWeight(A65) 


A 


c 

61 



62 

Formula 

Molecular Weight 

63 

H20 

TRUE 

18.02 

64 

C02 

TRUE 

44.01 

65 

CH3COOH 

TRUE 

60 051 

66 

C6H807 

TRUE 

192.1 2 


Above a way to embed the code in an Excel application where the formula is in column A and the molecular weight is 
calculated in column C. Column B indicates if the formula is according to grammar. 
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Parse follows semantics 

Checking syntax is applying a set of rules onto the input data. The route through the matching rules is thereby 
independent on the actual meaning of the parsed input, unless we are able to evaluate that meaning. In EBNF+ 
we can. 

The method to validate the syntax based on meaning is somewhat trivial in EBNF+. Just manipulate the Boolean 
pOk and the parse route follows your command. Flavours here: 

Setting pOk to False will route the parse as if the previous syntax check is invalid. Depending on the context, further 
syntax check may continue with other checks in the rule or simply the rule is found not applicable if there are no 
alternatives. 

Setting pOk to True will route the parse as if the previous syntax check was valid. This means that the result of syntax 
so far did not matter. Previous syntax check was hence optional. 

Setting pOk depending on some evaluation is the general and most useful form. It opens the way to decide if input is 
not only syntactically correct, but also meaningful. Most commonly one can check if a parsed text denotes a value 
that is within range, is even, odd, prime, round, black or purple. Such validation is often difficult or to lengthy to catch 
by syntax ruling. Now you can shortcut this. 


Example 4: Check the validity of an IP address. 

In this example we first check if the IP address has the correct form, then we add a semantic action to check if 
the value of the elements is within the allowed range. 


Step 1: Create the rule (user input) 


digit = "0".."9"; 
number = digit , {digit}; 
IPadd = number 


, number 
, number 
, number 


«pOk=Int (sLastln) <256» 
«pOk=Int (sLastln) <256» 
«pOk=Int (sLastln) <256» 
«pOk=Int (sLastln) <256» ; 


Now note the 4 semantic epressions with « and ». In the syntax check during parse they will be evaluated : is the 
integer value of the last parsed string smaller then 256. If it is smaller then 256 the parser is signalled that all is OK in 
respect of form (input are digits that form a number) as well as in repsect of meaning (value is les then 256). If it is not 
less then 256 then the parser is signalled False, that is something wrong, For the parser it actually does not atter if the 
False came from mismatch in syntax or mismatch in semantics. 

The semantic to syntax notation is simple and looks natural. I am of course somewhat biased , but try to do this with 
other parsers. 

Further notice the structure of the program. It's like a chess game board just before check mate. Any irregularity will get 
noticed before execution. 


Step 2: Compile the rule (EBNF+ output) 

Function digit() 
cPush 

Call plnterval("0", "9") 
cDrop 

End Function 
' digit = "0".."9"; 

Function number() 
cPush 
digit 

If pOk Then 
Do 

cPush 

digit 

cDrop 
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Loop While pOk And Not blnEnd 
pOk = True 
End If 
cDrop 

End Function 

' number = digit , {digit}; 


Function IPadd() 
cPush 
number 

pOk = Int(sLastln) < 256 
If pOk Then 
pin 
End If 
If pOk Then 
number 

pOk = Int(sLastln) < 256 
End If 
If pOk Then 
pin 
End If 
If pOk Then 
number 

pOk = Int(sLastln) < 256 
End If 
If pOk Then 
pin 
End If 
If pOk Then 
number 

pOk = Int(sLastln) < 256 
End If 
cDrop 

End Function 
' IPadd = number 
' , "." , number 

’ , "." , number 


«pOk=Int (sLastln) <256» 
«pOk=Int (sLastln) <256» 
«pOk=Int (sLastln) <256» 


,, number «pOk=lnt(sLastln)<256» ; 


Step 3: Apply the grammar 

B59 - i 

/v =parsestring(A59. BS56) 


j 

B 

C 

55 




56 


IPAdd 


57 

1.1.1.2 

TRUE 


5S 

169.123.233.255 

TRUE 


S9 

169.256.1.2 

FALSE 

1 

60 

1.2.3.4. 

FALSE 







Example 5: Check the validity of the time of day 


e.g"7:30 am"or"17:30 pm) 


Step 1: Create the rule (user Input) 

digit = ”0".."9"; 
number = {digit}!; 


time = number 

«pOk=Int (sLastln) <12>> 

, number 

«pOk=Int (sLastln) <60>> 

, ("am"|"pm") | 


number 

«pOk=Int (sLastln) <24» 

, number 

«pOk=Int (sLastln) <60»; 

In this example 1 used the }+ repetition symbol. 

It shortens a bit the grammar. 1 always wonder if it really increases 

readability. Anyway, it did not extend the parser kernel, only a few lines in the EBNF+.bnf-file. 
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Something different and more important. Note that in the grammar solution the whole input data will be reparsed again 
if at the end there is no pm or am string found. If there would be speed problems, one can think to specify the grammar 
differently. However, unless you parse the bible you are safe to do what you like, the tool is quite speedy. 

Another note: What would happen if you interchange the options? I am afraid it will always result in a syntax error 
for am-pm input formats. We will discuss this peculiarity later. Just live with this rule: 

"When applicable, specify the rules such that the long string is parsed first and then its left truncated form(s). n 


Step 2: Compile the rule (EBNF+ output) 

Attribute VB_Name = "T013_TimeOfDay_Kleene" 

' Compiled by: EBNF 0V55.xlsm 
' Date: 19/06/2014 17:24:23 
Option Explicit 

Function digit () 
cPush 

Call plnterval("0","9") 
cDrop 

End Function 
' digit = "0".."9"; 

Function number() 
cPush 

cPush 

■ 

cPush 

digit 

cDrop _ 

Loop While pOk And Not blnEnd 

pOk = (aContext(iContext, 1) <> ilnPnt) 

iContext = iContext - 1 
cDrop 

End Function 
' number = {digit}*; 

Function time() 
cPush 
number 

pOk=Int(sLastln) <12 
If pOk Then 
pin(":") 

End If 
If pOk Then 
number 

pOk=Int(sLastln)<60 
End If 
If pOk Then 

cPush 
pin("am") 

If Not pOk Then 
cTop 

pin("pm") 

End If 
cDrop 
End If 

If Not pOk Then 
cTop 
number 

pOk=Int(sLastln)<24 
If pOk Then 
pin (":") 

End If 
If pOk Then 
number 

pOk=Int(sLastln)<60 
End If 
End If 
cDrop 

End Function 

' time = number «pOk=Int(sLastln)<12>> 
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' , , number «pOk=Int (sLastln) <60>> 

' , ("am"|"pm") | 

' number «pOk=Int(sLastln)<24>> 

' , , number «pOk=Int (sLastln) <60>>; 

Some light on the + repetition clause compilation result. 

At the end of the clause (yellow) the parser checks if it has progressed in the input data file (orange). ilnPnt is the pointer 
into sin to indicate where the next string will be parsed. 

aContext(iContext,l) is the first value in the 4 dimensional stack used to remember the parser status for backtracking. 
This value, as a smart engineer may guess, is the old ilnPnt as it was at the start of the clause, pushed by the 2nd cPush in 
the function (blue). If the old is not the new then within the clause something was parsed successfully, i.e. it was not 
empty. Note the fun of having pOk. The success is passed onto this Boolean (green). 

The 2nd cPush is balanced by decrementing the stack pointer in: iContext = iContext - 1. Not so elegant, but forgive me, I 
needed a stack. The cDrop has parser functionality, which I did not like to touch. 


Step 3: Apply the grammar 

1 

► 

r^- 

CD 


=parsestring(A71. BS68) 


A 

B 

c 


63 


time 



69 

09:45pm 

TRUE 



70 

9:55am 

TRUE 



71] 

|9:60pm 

FALSE 

| 


72 

12:15 

TRUE 



73 

12:15am 

FALSE 



74 

12:15pm 

FALSE 



75 

24:31 

FALSE 



76 

03:10 

TRUE 









State machines for co-operative parsing 

The notion of concurrent state machines is added to EBNF+ to simplify models for real time systems with 
multiple protocols acting at the same time. It solves the problem of long lists of enumerating input combinations 
that result from multiple sessions at the same time over one line. Having concurrency in the EBNF+ tool (or 
actually the simulation of it) makes life of the developer a lot easier. 

State machines help understand and manage real-time systems. With real-time we not only mean to act within 
specific time on external events, but also the control of multiple independent or co-operating processes at the 
same time. 

EBNF+ supports state machines with minimal effort using one extra symbol: the hash (#). 

Example, consider a payment system where each payment is performed in two steps: the proposal to pay and 
the acceptance of this proposal. In EBNF+ this can be declared as state machine: 

Payment1 # proposal, acceptance; 

Here the hash symbol states that the Paymenti is the name for the state-machine with two states: proposal 
and acceptance. The comma-symbol signals that acceptance (with state accepted) can only be reached when 
proposal (with state proposed) is done. The right hand side of the declaration resembles a syntax rule where 
the term acceptance is parsed only after successful proposal. When right side of the rule would be a 
grammar, then its parse is unsuccessful when one of the terms would not be found in the input. For the state 
machine definition this is the same except for one essential difference: If the term is not found in the input, the 
state machine simply waits, that is, it returns execution to the scheduler (the process that called the state 
machine) until again called where left off. While waiting, the scheduler can call other state machines. This 
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concept becomes meaningful with multiple active state-machines. So let's have in the example a second 
payment machine and have a payment system where it is allowed to have both payments continuously running 
in parallel: 

Payment2 # proposal, acceptance; 

PaymentSystem = {Paymentl | Payment2}; 


PaymentSystem is an ordinary EBNF declaration and it is indeed nothing more than that. But the effect is that 
non-terminal/procedure will continuously call the state machines one after the other. If one state machine 
comes in a state where it cannot further parse the input, then the other state machine is called. If the second 
state machine stops the next one is again started or resumed where it has left off and so forth. The repeated 
sequence then simulates as if both machines run in parallel, concurrently parsing the input. The PaymentSystem 
is the scheduler for the two state machines here. 


Above can be further worked out to analyse traffic of real payment 
systems. Just fill in the syntax details of the payment and accept 
messages in the protocol that is used by the particular payment system 
and run it. A worked out example for IS08583 payment protocol is 
given in [6]. As far as you have memory and processing power you can 
add as many processes/state machines as you like. 

For demonstration we simplify the terms by using single byte messages 
in the two state-machines and add a scheduler. 

Sesl # "a","b"; 

Ses2 # "d","e"; 

Trxs = Sesl(True), Ses2 (True), {Sesl|Ses2}*; 

As courtesy to proper design, the terms Sesl(True), Ses2(True) set the 
state-machines Sesl and Ses2 to initial state; one should not trust that 
they have initial state at declaration. Also during debugging, states may 
get undefined. 

Compile the three lines and load it into Excel and apply it on some input 
as below. You now modelled one of the trickiest phenomena's in 
programming: concurrency. Figure on the side shows the result as Excel 
application. 


= Parsestring(A3;$B$2) 



A 

B 


Sesl # "a","b"; 

Trxs = Sesl(True), j 


Ses2 # "d","e"; 

Ses2(True), 

1 


{Sesl|Ses2}*; 

2 

input 

Trxs 

3 

abde 

true| 

4 

adeb 

TRUE 

5 

adbe 

TRUE 

6 

dabe 

TRUE 

7 

daeb 

TRUE 

8 

deab 

TRUE 

9 

abed 

FALSE 

i 

10 

abde 

TRUE 

11 

aedb 

FALSE 

12 

abdeab 

TRUE 

13 

adbaeb 

TRUE 

14 

abdeddee 

FALSE 

15 

abdeaba 

TRUE 

_ 

_ 

J 


Scheduler types 


In the above example we used for the 

B39 

- : 

I* = Parsestring(A39;$B$35) 



scheduler the bar symbol (|), which 


A 

B 

C 

D 

E 

indicates that as alternative of the left 

34 


hand side the right hand side of the 


sml # "a", "b"; 





symbol is scheduled. For two independent 

35 

sm2 # "c", "d"; 

SesOr 

SesAnd 

SesOrRep 

SesAndRep 

processes such as in the payment example 

36 

input 

= sml 1 sm2; 

= sml, sm2; 

={sml Ism 2} 

= {sml,sm2} 

this is applicable. However, if the right 

37 

a 

TRUE 

TRUE 

TRUE 

TRUE 

hand side should only be scheduled if the 

38 

ab 

TRUE 

TRUE 

TRUE 

TRUE 

right side is finished one should use the 

39 

abc 

FALSE 

TRUE 

TRUE 

TRUE 

comma sign (,), which indicates 
concatenation of processes. 

40 

abed 

FALSE 

TRUE 

TRUE 

TRUE 

41 

acbd 

FALSE 

FALSE 

TRUE 

TRUE 

42 

aedb 

FALSE 

FALSE 

TRUE 

TRUE 

Figure on the side shows the principal 

43 

dabc 

FALSE 

FALSE 

FALSE 

FALSE 

results of alternative and concatenation 

44 

cd 

FALSE 

FALSE 

TRUE 

FALSE 

schedulers. 

45 

edab 

FALSE 

FALSE 

TRUE 

FALSE 


46 

cadb 

FALSE 

FALSE 

TRUE 

FALSE 


47 

edabaedb 

FALSE 

FALSE 

TRUE 

FALSE 


48 

edabaedbf 

FALSE 

FALSE 

FALSE 

FALSE 
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FAQ on State machine 

Q. Does EBNF+ support nesting of state machines? 

A. Yes. E.g.: 

Sesl # "a", "b"; 

Ses2 # "c", Sesl | "d"; 

Q. Does EBNF+ support recurrence? 

A. No. The EBNF+ state machine declarations should be seen as a single object declaration, not the definition of 
a class. 


Q. Can one add semantics to the expression? 

A. Yes. Full host language is available as the rest of EBNF+ expressions. Also the declared variables within the 
state machine will be non-volatile and keep their value through out execution of the state machine independent 
when other machines process. 


Q. Can one combine state machine expressions with other EBNF+ syntax symbols? 

A. All symbols are allowed except the -symbol. Note that the parse of the state machine expressions have no 
back track mechanism so here the parser becomes LR(0). 
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6. EBNF+ defined by itself 


A popular EBNF introduction is the syntax definition of EBNF by EBNF itself. Let's examine the rule definition in 
EBNF of an EBNF grammar rule. It is made up of an identifier, the equate-sign, a list of definitions and closed by 
the semicolon: 

Rule = Identifier , ' = ' , DefinitionsList , ';' ; 

As you see, it really describes its own form. A bit spooky, such as "I think, hence I am". 

For EBNF+ we take this self-definition idea a step further. We not only self-define syntax but also self-define the 
linked semantics! Self-define in software terms means that we have at one instance a tool and its source; the tool 
then compiles the source into an executable program that is the tool itself again. 

Self-definition of computer languages is not new, popular examples are LISP, Prolog and Forth. Also many 
assemblers and simple language compilers no doubt bootstrap themselves. New with EBNF+ is probably that the 
self-definition is not very deep under the hood and you as user has the freedom of syntax choice. Self-definition 
gives great mobility on how it should look like and what it should do. 

Self-definition may sound magic, but it is actually not, once you see the mechanism. The trick is to find the minimal 
cut through all solutions here. We will end up with a hand full of variables and a page of code and always will wonder 
why one would do it differently. 

OK, enough boasting. Let's have the meat. 


Step 1: We write the definition of the pRule in EBNF+ (Source A) 


pRule = pOut ('Function ') 

, pldentifier , pOut (' () ') 

, pS , '=' , pOut (vbCrLf & 'cPush') 

, DefinitionsList 

, ' ; ' , pOut (vbCrLf & &'cDrop') 

, pOutlndSub ('End Function'); 

The prefix letter p is just to 
have unique identifiers. pS is 
the non-terminal that defines 
what space is. Simplified 
definition: pS = {" "}; 

The function pOut just 
appends its argument to a 
string sOut. 

Step 2: When this rule is compiled by the tool you get (Object A): 


Function pRule () 
cPush 

pOut ("Function ") 

If pOk Then 

pldentifier 

End If 

If pOk Then 
pOut (" () ") 

End If 

If pOk Then 
pin (" = ") 

End If 

If pOk Then 

pOut ( "cPush" ) 

End If 

If pOk Then 

DefinitionsList 

End If 

If pOk Then 
pin ( ";") 

End If 

If pOk Then 

pOut ( "cDrop" ) 

End If 

If pOk Then 

pOut("End Function") 

End If 
cDrop 

End Function 

Now take a real close look to 
this. The first word "Function" 
will actually be appended as 
output by the function pOut 
("Function") in the third line. 

The second word "pRule" will 
actually be appended as 
output in the 5 th line by the 
non-terminal pldentifier, more 
precise by the rule/functions 
with the name "pldentifier" 
somewhere else defined. The 
opening and closing 
paranthesis will actually be 
appended ... etc. 

Step 3 . Execute pRule 
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When this code executes on the Source A you will get Object A. 

Wait a minute.. 

This is Step 2 again! Welcome to the trick of self-definition! 

In practice we can add, drop or optimise the source for each compilation. The tool then evolves in every 
iteration. Currently the tool is indexed 0V55. The number 55 is actually the 55th iteration since its existence. 
Probably I'll cheat one time and skip to 1V0 to make you belief it has no bugs. 

Below a description of the kernel functions and the complete requirements specification, in effect it is the 
source of the EBNF+ tool to be compiled by the tool itself. 


Grammars 

Time to go one level higher, the Grammar which happens to be the top level. 

A grammar is only a set of rules: 

pGrammar = {pRule } ; 

To allow some freedom in page format the EBNF+ definition is actually: 

pGrammar = pS , { pRule , pS } ; 

where pS is any sequence of spaces, control characters and further non grammar text such as for comments and 
actions specification. 

As you saw the EBNF+ tool makes from pGrammar a subroutine with same name. Now what happens if you call 
this routine. Indeed, it will parse a grammar file: the set of rules that defines a language. For the self-definition 
of EBNF+ the grammar file is also extended with actions, so it will not only check syntax of the grammar but also 
will compile/interpret the syntax/semantics. 

For the code oriented reader, we can find the self-definition in the code. The function Parsefile is called when 
you want to compile a grammar. Below the complete code. Note the call in the middle of it. It is the call to the 
grammar rule, named pGrammar, which will actually parse the input file. Rest of the function is less impressive. 
At start we set some variables that initialise parsing, amongst one that opens the source file and assigns the 
content to sin. When the call is returned from pGrammar, the ParseFile functions displays if the file was 
successfully parsed. If the parse was unsuccessful, ParseFile displays an error message at the point where the 
grammar could not produce the input file, or in more actual process terms, where the grammar could not parse 
the input. 

Sub ParseFile() 

Dim sFPN As String 
Dim sFN As String 
Dim oFile As Object 
Dim i As Integer 

sFPN = Application.GetOpenFilename( _ 

FileFilter:= M EBNF Source Files, *.bnf", _ 

Title:="Select EBNF Source File", _ 

MultiSelect:=False) 

If Len(sFPN) < 8 Then Exit Sub 'Later change sFPN type to Variant 

sFN = Mid(sFPN, Len(CurDir) + 2, Len(sFPN) - Len(CurDir) - 5) 

For i = 1 To Len(sFN) 

If Mid(sFN, i, 1) = " " Then sFN = Left(sFN, i - 1) & & Mid(sFN, i + 1, 63) 

Next 

sOut = "Attribute VB_Name = " & Chr(34) & sFN & Chr(34) & vbCrLf 
sOut = sOut & "' Compiled by: " & ActiveWorkbook.Name & vbCrLf 
sOut = sOut & "' Date: " & Now() & vbCrLf 

sOut = sOut & "Option Explicit" & vbCrLf 

InitParser (Fileln(sFPN)) 
pGrammar 

If pOk And blnEnd Then 

MsgBox "Source grammar Ok." 

Else 

MsgBox "Grammar not OK: .." & Mid(sin, Max(ilnPntMax - 30, 1), _ 

Min(ilnPntMax - Max(ilnPntMax - 30, 1), 30)) & "->ERROR->" & Mid(sin, ilnPntMax, 40) & ".." 

End If 

Call FileOut(Left(sFPN, Len(sFPN) - 3) & "bas", sOut) 

End Sub 
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Kernel variables and functions 


String sin and Integer ilnPnt 

As seen in the examples each source file (*.bnf) is a normal text file expressed in EBNF+ grammar. At the start 
of the parse the complete file is copied into sin. 

ilnPnt be the integer that points to the first character to be parsed starting with one, albeit if sin is empty. At 
start of the parse ilnPnt is 1 at the end when the parse is successful it stares immediately after the last character 
in sin. 


String sOut 

As seen in the examples each source file (*.bnf) is a normal text file expressed in EBNF+ grammar. The EBNF+ tool 
compiles this source into another text file in the host language. In the examples the host language is a VBA file (*.bas). 
This file can then be imported as module in Excel (or VB) for your application. 

In the process of EBNF+ compilation a string grows and at fulfilment this string (sOut) is exported as the *.bas file. 
When an error is found the file *.bas file is also exported, but as far as it was completed. On screen an error message 
indicates the spot in the source where after parse went wrong. 

Since the almost only thing that happens to string sOut is that something is appended to it, there is no pointer for 
sOut used by the parser. 


Function pOut 

In the EBNF+, the compiling process translates input (string sin) to output data (string sOut), conform the combined 
EBNF+ syntax-semantic specification. A typical sequence in this specification looks like: 

syntax-expression, pOut(StringToAddTo_sOut).. 

If the syntax-expression parses the sin part successfully then a function pOut will add the string to the sOut. Again 
note that the comma states that the next statement is only executed when the parse at that point is valid. 

Now you may check your guess for the definition of pOut: 

Function pOut(sAdd As String) 
sOut = sOut & sAdd 
End Function 

An example is: 

pOut('Loop While pOk And Not blnEnd') 

that generates the end loop statement for the EBNF repetition clause. This is actually also the longest sOut addition 
per pOut statement used. 

Note that pOut concatenates strings to sOut. New lines in sOut is not automatically added. You may add new lines in 
the object code using CrLf-characters or else make use of the pretty printing functions. 


Pretty printing functions 

When generating program code, tabs can be used to give it a nice structured appearance. For this, three further 
variants of pOut are available, where the Integer variable ilndent indicates the number of spaces to the right that lines 
up with the proper nesting level. 

To add a text line to the current nesting level, use pOutlnd: 

Function pOutlnd(sTerm As String) 
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sOut = sOut & vbCrLf & Space(iNdent) & sTerm 
End Function 

To add a text line and then nesting a level deeper, use pOutlndAdd: 

Function pOutlndAdd(sTerm As String) 

sOut = sOut & vb CrL f & Space(iNdent) & sTerm 
iNdent = iNdent 
End Function 

To add a text line and then a level less deep, use pOutlndSub: 

Function pOutlndSub(sTerm As String) 
iNdent = iNdent - 3 

sOut = sOut & vbCrLf & Space(iNdent) & sTerm 
End Function 

Indentation per level is 3 spaces. I am Dutch so a bit sparse on space. You may use 21 if you live in Norway. 


Functions cPush, cTop, and cDrop 

As we saw, the sOut is the string that during the parse is build up to deliver at a successful parse the final output. 
During the parse process, the string sOut not only extends during a parse. It can also shrink, when a rule or part of a 
rule could not be applied, that is when the parse hits the famous blind alley. In that case the parse status must be set 
back to the status before the parse took this alley. 

Of much interest is where in the parser source, execution goes into a possible blind alley. These are defined in the 
EBNF+ grammar rules! There we see that backtracking is possible in rules for: 


OptionalSequence 

RepeatedSequence 

GroupedSequence 

pRule 


(in the [... ] construction 
(in the {...} contruction 
(in the (...) construction ($) 

(in the definition of a non-terminal). 


The backtrack mechanism is performed by 3 functions, cPush, cTop and cDrop, and since this mechanism should 
operate in nested situations it uses a stack structure, named aContext, to store the parse status. All functions are, as 
their names indicate, stack operators, allowing nesting of syntax rules and backtracking any level deep as long as you 
have space. In the example the stack is 100 level deep, which seems sufficient for general use of the parser (research 
needed here): 

Public aContext(1 To 100, 1 To 5) 

Public iContext As Integer 


Stack is 5 dimensional, that is 5 arguments actual form a parse status. The arguments are: 


ilnPnt, the pointer into the input data where the next parse will continue 


blnEnd, the Boolean that is true when the pointer reached the end of the input data. In that case blnEnd = (ilnPnt > 
Len(sln)) 

Len(sOut), since each rule can only add text to sOut, back tracking only has to remember the length of sOut. 
Truncating the added text is the only thing needed at a backtrack 


sLastln, the string in the input as successfully parsed by the last terminal or non-terminal rule. 


iNdent, number of spaces to indent for pretty printing 
The 3 functions are: 


cPush, used to remember the parse status in case the optional rule or part of the rule is invalid 

Function cPush() 

iContext = iContext + 1 

If iContext > 100 Then MsgBox "Context stack overflow" 


aContext(iContext, 

1 ) 

= ilnPnt 

aContext(iContext, 

2 ) 

= blnEnd 

aContext(iContext, 

3) 

= Len(sOut) 

aContext(iContext, 

4) 

= sLastln 

aContext(iContext, 

5) 

= iNdent 


End Function 
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cTop, used to set the parse status back in case the rule or part of it was invalid 

Function cTop() 

ilnPnt = aContext(iContext, 1) 

blnEnd = aContext(iContext, 2) 

sOut = Left(sOut, aContext(iContext, 3)) 

sLastln = aContext(iContext, 4) 

iNdent = aContext(iContext, 5) 

End Function 

cDrop, used to drop the copy of the parse status, since the parsed option was valid 

Function cDrop() 

ilnPntMax = Application.Max(ilnPnt, ilnPntMax) 

If pOk Then 

sLastln = Mid(sln, aContext(iContext, 1), ilnPnt - aContext(iContext, 1)) 

Else 

cTop 
End If 

iContext = iContext - 1 
End Function 


Notes: 

Functions cPush and cDrop should be in balance, unless someone is manipulating iContext. 

sLastln, holds the last string that was successfully parsed by a terminal or non-terminal. Is extensively used for 
semantic operations. It is the only parameter the syntax communicates to the semantics! Sharing is one way, syntax 
picks it up from the input sin and the semantic part can use it. 

cDrop generates the string sLastln for non-terminals. cDrop snatches it from the stored ilnPnt at the start of the 
execution of the non-terminal up to the end of that non-terminal. That is, if the parse was successful. 


Boolean variable pOk 

Let pOk be the state that signals parse success or defeat starting with True. State can change during the 
evaluation of the grammar rule. The state determines further parser behaviour in the grammar, such as: 
"I" (will skip if pOk is true) 

(will continue if pOk is true). 


Function pin and Boolean blnEnd 

Parsing is in its heart the check that part of the input data matches some candidate terminal string (sTerm) proposed 
in a rule, pin has the honour to service this. It is therefore the most complex function in the EBNF+ machine: 

Function pin(sTerm As String) 

pOk = (Mid(sin, ilnPnt, Len(sTerm)) = sTerm) 

If pOk Then 

ilnPnt = ilnPnt + Len(sTerm) 
blnEnd = (ilnPnt > Len(sin)) 
sLastln = sTerm 
End If 

End Function 

Almost trivial, would you not agree? 

Boolean blnEnd is the semaphore that signals us that we reached beyond the input parsed. If both pOK and blnEnd 
are true after a parse then sin conforms to the specified grammar. Note the strictness. One character less or too 
much and EBNF+ will blow the whistle. 


Source of EBNF+ in EBNF+ 

Below the commented full source of the EBNF+ (version 24) tool, both syntax and the semantic. 

pLetter = 'a'..'z' | 'A'..'Z' | ; 

pDigit = '0'..'9' ; 
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pSymbol = 


']' I I '}' I ' ( ' I ' ) ' I ' <' I ' >' I ' =' I 'I' I I I A 35 | 

'+' I I I ' & ' I ' A ' I I f \ f I I '?' I ' @ ' I ’ / ' ; 


pCtrlChar = A 09 | A 10 | A 13 ; 

So horizontal tabs, line feeds and carriage returns are considered control characters. For the EBNF+ grammar they are 
considered as space that may delimit terminals. 

pCharacter = pLetter | pDigit | pSymbol | pCtrlChar | " "; 

pOperator = pS , ('&' | V | ) , pOut ('' & sLastln & ''); 

pComment = '(*', {('*)' | pCharacter)-'*)'}, '*)'; 

Above handles the closing of the Comment by the closing symbol '*)'. The last part ", '*)'" will always be encountered 
in the parse since it was the only escape out of the Comment. You may now wonder why we then specify it; simply, to 
get passed the closing symbol. 

Also note that Character may also be control characters. So your comment can be many lines and pages as long as you 
close it with , '*)' 


pCode = '«' 

, {('»' | pCharacter | | '"") -'»' 

}, 

pCode2 = '<.' 

, {('.>'|pCharacter| '"' |" ' ") - ' . > ' 

}, 

pCode2 statements are in the source enclosed between '<.' ar 
literally and enclosed by the If and End If statements. 


, pOutInd( MM ) 

, pOut (sLastln) 

' » ' ; 

, pOutlndAdd('If pOk Then'S vbCrLf) 

, pOut (sLastln) 

'.>' , pOutlndSub ('End If'); 

J '.>' symbols, they are translated to the host language 


pS = { ' ' | pCtrlChar | pCode | pCode2 | pComment}; 

Space and other text layout characters are in EBNF+ handled just as any other data. This leads to a hassle free and 
formal dispatch of these characters by adding rules to the grammar. This property allows parsing of expressions 
based on textual layout and non-text input such as for communication and different file formats. 

When your syntax depends on the layout than you can specify this in the EBNF+. Any sequence can be programmed 
as part of the grammar. Whether remarks should be closed at a new line, whether space is optional, whether a 
function declaration should start on a new line, all this can be explicitly specified in EBNF+. 


plnteger = pS 

, pDigit 

, pOut 

(sLastln) 




, { pDigit 

} ; 

, pOut 

(sLastln) 



pldentifier = 

pS , pLetter 

, pOut 

(sLastln) 




, { pLetter 

, pOut 

(sLastln) 




| pDigit 

}; 

, pOut 

(sLastln) 



pString = pS, 

If V VV 1 II 1 II 1 If 

, pOut 

('Chr(34) 

') 


1 ps. 

1 II 1 

, pOut 

( 1 " ') 




, { (pCharacter|"'") 

, pOut 

(sLastln) 




} , "" 

, pOut 

( i ii i ) 



1 pS, 

II 1 II 1 II 1 

, pOut 

( 'Chr(34) 

& " 

') 


, { pCharacter 

, pOut 

(sLastln) 




r" 1 

, pOut 

( '" & Chr 

(34) 

& 


} , ,,,n 

, pOut 

( i ii i ) 



1 pS 

u i ii 

, pOut 

( i ii i ) 




, { pCharacter 

, pOut 

(sLastln) 




1 , „ , 

, pOut 

( '" & Chr(34) 

& 


} , " '" , pOut ( "" ) ; 


Above String definition could be smaller, but I could not resist to optimize the output and allow double and single 
quotes anywhere in the input string. 


pTerminal = pS, ' A ' 

,{pDigit 

}, pS, '..', pS, ' A ' 


{pDigit 

} 


, pOut ('Call plnterval(Chr(') 
, pOut (sLastln) 

, pOut ( ' ),Chr(') 

, pOut (sLastln) 

, pOut ('))') 
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pS, ' 

A , 



, pOut 

('pin(Chr ( ') 


, {pDigit 



, pOut 

(sLastln) 


} 



, pOut 

('))') 

pS, " 

V ff 1 II 1 II 1 II 



, pOut 

( 'pin(Chr (34)) ' ) 

pS, " 

! II 



, pOut 

('Call plnterval(') 


,(pCharacter 



, pOut 

('"' & sLastln & '"') 


l ,,M 



, pOut 

( ' " ' ) 


) 

" ", pS, ' .. ' 

, ps, '"" 

, pOut 

(\ ') 


, (pCharacter 



, pOut 

('"' & sLastln & '"') 


l ,,M 



, pOut 

( ' " 1 ) 



VI V VI 


, pOut 

(')') 

pS, " 

V vv 



, pOut 

(' pin ('" ) 


,{(pCharacter 



, pOut 

(sLastln) 


1 " 

1 1 


, pOut 

('" & Chr(34) & "') 



)} , . 


, pOut 

("" & ' ) ' ) 

ps, ' 

II V 



, pOut 

('Call plnterval ("') 


,{ (pCharacter|"' 1 

") 


, pOut 

(sLastln) 



}, ,M \pS, '. 

.',pS, '" 1 

' , pOut 

("V" ) 


, { (pCharacter|" 

'") 


, pOut 

(sLastln) 

} 

f II 1 



, pOut 

( 1 ") 1 ) 

ps, ' 

II 1 



, pOut 

CpInC") 


, { (pCharacter|" 

,n J 


, pOut 

(sLastln) 



} , "" 


, pOut 

( 1 ") 1 ); 


Again this rule could be much smaller, but it minimises output, handles mixed use of double and 
single quotes in strings and the specification of character intervals. 

pFunction = pldentifier,[ pS,'(' , pOut (sLastln) 

, (pStringExp|plnteger) , pS, ')' , pOut (sLastln)] ; 

pStringExp = ( pString | pFunction | plnteger ), { pOperator, ( pString | pFunction | plnteger ) }; 


OptionalSequence = pS, '[' 

, DefinitionsList, pS, 


pOutlnd ('cPush') 
pOutlnd ('pOk = True') 
pOutlnd ( 'cDrop'); 


RepeatedSequence = pS, '{' 

, DefinitionsList, pS, '}' 


pOut ('Do') , pOutlndAdd(' cPush') 

pOutlnd ('cDrop') 

pOutlndSub('Loop While pOk And Not blnEnd') 
pOutlnd('pOk = True'); 


KleeneSequence = pS, '{' 


, DefinitionsList, pS, 


, pOutlnd ('cPush') 

, pOutlnd ('Do') , pOutlndAdd(' cPush') 

'}*' , pOutlnd ('cDrop') 

, pOutlndSub('Loop While pOk And Not blnEnd') 

, pOutlnd('pOk = (aContext(iContext, 1) <> ilnPnt)') 
, pOutlnd('iContext = iContext - 1'); 


GroupedSequence = pS, '(' 

, DefinitionsList, pS, ')' 


pOutlnd ('cPush') 
pOutlnd ('cDrop'); 


Primary = pS , pOutlnd('') 

,( OptionalSequence | 

RepeatedSequence | 
pTerminal I 

GroupedSequence | 
pFunction ) ; 


Exception = pOutlndAdd ( 'If pOk Then') 

, pOutlnd ('cPush') 

, pOutlnd ('ilnPnt = ilnPnt - Len(sLastln)') , 

Term , pOutlnd ('cDropExcept') 

, pOutlndSub ('End If'); 


As was expected the real nice EBNF symbols are the most nasty to program. The exception symbol will be as the 
cDrop except the other way around. It will update sLastln. In next paragraph its mechanics is further discussed. 

Factor = Primary | «p0utlnd ("vPush(")>> 

plnteger «pOut (")")>> 

«pOutIndAdd ( "Do While vTop > 0")>> 

, pS, '*' , Primary 

«p0utlnd ("DecvTop") >> 

«pOutIndSub ("Loop" ) >> 

«p0utlnd ("vDrop")>>; 


Term = Factor , [ pS , '-' , Exception ]; 


SingleDefinition = Term , { pS , ' , ' , pOutlndAdd ('If pOk Then') 
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, Term 

} ; 


, pOutlndSub ('End If') 


DefinitionsList = OrDefinitionsList | AndDefinitionslist ; 


OrDefinitionsList = SingleDefinition , { pS , '|' , pOutlndAdd ('If Not pOk Then') 

, pOutlnd('cTop') 

, OrDefinitionsList , pOutlndSub ('End If') 

} ; 


AndDefinitionsList = SingleDefinition , { pS , 'AND' , pOutlndAdd ('If pOk Then') 

, pOutlnd('cAndTop') 


, AndDefinitionsList 

} ; 


pOutlnd('cEndAnd') 
pOutlndSub ('End If') 


pRule = pS 

, pldentifier 

, pS , '=' 

, DefinitionsList , pS 


, pOutlndAdd (vbCrLf & Space(iNdent) & 'Function ') 
, pOut ('()') 

, pOut (vbCrLf & Space(iNdent) &'cPush') 

I . I 

, pOut (vbCrLf & Space(iNdent) &'cDrop') 

, pOutlndSub ('End Function') ; 


pGrammar = «Dim ilnPntOld As Integer>> 

pS , { ccilnPntOld = iInPnt» 

pRule 

«dim i as integer 
pOut (vbCrLf & "' ") 

For i = ilnPntOld To ilnPnt - 1 
pOut (Mid(sln, i, 1)) 

If Mid(sln, i, 1) = Chr(10) Then pOut ("' ") 

Next>> 

, pS } ; 


The block of code in pGrammar prints under the compiled rule the source commented out in VBA style. 


Particularities (Read: where I stumbled upun) 

Compiler optimization of execution speed of the sink code can be well controlled in EBNF+. In the parse one can 
check if the compiled code is optimal. If not, just signal the syntax option as invalid. The parse will then try an 
alternative. In the source this has been done on a number of definitions: 

• String 

• Terminal 


Recursion 

EBNF+ is devised as lean top down parser of type LR(k) [3] and there are inherent features and limitation to this 
parser type. 

Rules such as: 

number = digit , [ number] ; 

cannot be written as: 
number = [number ], digit; 

Well actually you can and the parse will try out for you an infinite loop. Parse will stop nonetheless with an error 
message since the recurrence can only occur 100 level deep (configurable). 

Hence in general: 

Specify recursive rules such that recursive non-terminal is specified to the right as far as possible. 
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Order of alternative order matters in EBNF+ 

When a specific rule is found applicable in EBNF+, parse is further continued at the originating rule. Once this 
happens, further alternatives within the specific rule are never evaluated. Effect is for example when A is a non¬ 
terminal specified as: 

A = 'a'| 'ab'; 

then this rule will never be applied to the alternative part ('ab'). 

When the input string 'ab' is parsed with this rule, the rule is found applicable but only at the first part of the 
rule: it will match 'a'. The second character ('b') is always out of scope of this rule and can only be further 
parsed by the originating rule. 

Instead the rule can be rewritten as: 

A = 'ab'| 'a'; or 

A = 'a', 'b'-'b'l 'ab'; 

Note that the latter is an easy fix if you noted a wrong alternative order. It is ugly though, so don't do it. 
General rule of thumb for circumventing the problem of alternative order: 

Specify the rules such that the long string is parsed first and then its left truncated form(s). 

Note that this effect does not occur in the case the form is : 

A = ’a', B | 'ab'C ; 

When non terminal B fails, the parse will continue at the alternative 'ab' C. When however B succeeds then 
alternative 'ab' C will never be parsed. 

In practice one should always be on the alert when designing syntax rules for input strings with different 
meaning but start with same syntax. An example in EBNF is the general rule for pTerminal: 


pTerminal = pS, ,A ' 

1 




, pOut 

('Call plnterval(Chr(') 

,{pDigit 




, pOut 

(sLastln) 



}, PS, '. 

• PS, ' A 

' 

, pOut 

(') ,Chr(') 





, {pDigit 

, pOut 

(sLastln) 





} 

, pOut 

('))') 

ps, ' A ' 





, pOut 

( 'pin(Chr ( ') 


, {pDi 

git 



, pOut 

(sLastln) 



T 



, pOut 

('))') 

I etc. 








Using the first part (yellow), this general definition can be used for parsing specific input strings such as in: 
MyChrs = A 0.. A 255 ; and 
MyChr = A 23; 

If the general definition was defined the other way around for the two alternative parts (the yellow and green 


pTerminal = pS, ' A ' 

, pOut ( 'pin(Chr(') 

,{pDigit 

, pOut (sLastln) 


} , pOut ('))') 

1 ps, "" 

, pOut ('Call plnterval(Chr(') 

,{pDigit 

, pOut (sLastln) 


}, pS, ' . . ', pS, ' A ' , pOut ( ' ),Chr(') 


, {pDigit , pOut (sLastln) 


} , pOut ('))') 

| etc. 



part), then the parsing of MyChr = A 23; would go OK, but MyChrs = A 0.. A 255 would error at the dots on the 
originating rule, since the green part would have succeeded the parse according to the pTerminal rule. 

Of course it possible to develop a parser where the order of definitions would not matter. Nevertheless such 
parser types create the problem of having multiple parser paths that have different meaning, creating 
ambiguity. In EBNF+ a rule with wrong alternative order signals syntax errors at parse. Just correct your parse 
rule then. 

Furthermore, the property of first alternative only can well be used in common solving common problems as 
shown in next example. 
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Example 6: Filter string with specific syntax out of a random text file 

Imagine to find a string that matches some required grammar in a random text file. For this we can use the EBNF 
property of first alternative match parse only. 

Let MyText be a random text file of Characters, and let SearchedString the grammar of the string to search, then the 
rule: 

MyText = { SearchedString | Char}; 
will successfully find all matching strings. 

For example if you want to find in some lengthy and complex log file the occurrences of all magnetic card tracks that 
are conform to some ISO standard format then the rule: 

MyLogs = {IsoTrackData <. Action when found .> | Char}; 

will perform some action when proper card data is found. 

Of course you need the IsoTrackdata grammar. Just read the ISO standard and create your EBNF+ grammar for this. 
Below the worked out example to deliver in an Excel sheet all strings that conform to the IS07813 track format. 

« Dim sCard As String » 

« Dim iLin As Integer » 

Digit = "0".."9"; 

STX = ";" ; 

PAN = 17*Digit , [ Digit , [ Digit ] ]; 

FS = "=" ; 

ED = "00" . . "99" , "01" . . "12" ; 

SC = 3* Digit | 

DD = { Digit }; 

ETX = "?"; 

LRC = Digit; 

IsoTrackData = STX, PAN, FS, ED, SC, DD, ETX , LRC ; 


pLogs 

— 

V V 

V v 1 

Cells.Clear 

iLin = 1» 

» 

{ 

ISOTrackData 

<• 

DropInSheet 

• > 

1 

A 01. . A 255 } 




« Sub DropInSheet 


» 


« 

Cells(iLin+4, 

1 ) = 

iLin » 


« 

Cells(iLin+4, 

2) = 

sLastln >> 


« 

iLin = iLin + 

1 

» 


« End Sub 

» 




Now set the parse variables and call pLogs will start your filtering of card data. That's it. 


Intervals 

Terminals can be specified as intervals, such as: 

Letter = "a"., "z" | "A".. "Z"; 

The expressions in the interval spec must be a single character when using quotes: 
charl ..char2 (e.g. "A".."Z"). 

The expressions in the interval spec must be an integer 0..255 when using the A prefix: 

A intl.. Tnt2 (e.g. A 0 .. A 255). 

Other forms are not supported by EBNF+. Actually it works for mixed length, but I have to scratch my head if it 
has predictable and meaningfull behaviour. 

Let's following all aspects of EBNF on this. 

Compile time syntax-semantics of EBNF+ interval support 

pTerminal = ... 


PS, 1 

A ! 



, pOut 

( f Call plnterval(Chr( ' ) 


,{pDigit 



, pOut 

(sLastln) 


} , ps, ' . 

• ' , pS, ' A ' 


, pOut 

(’) ,Chr(') 




, {pDigit 

, pOut 

(sLastln) 




} 

, pOut 

('))') 

pS, " 

! 1 ? 



, pOut 

(' Call plnterval ( ' ) 
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(pCharacter 

1"" 

) , 

"pS, '. 

.',pS, "" 

, pOut ('"' & sLastln & ' 
, pOut ("") 

' , pOut (',') 

"") 

(pCharacter 

1 "" 

) , 

I! ! !! 


, pOut ("" & sLastln & ' 
, pOut ("") 

, POut (')') 

"") 


Runtime semantics for plnterval 

Function plnterval(sTerml As String, sTerm2 As String) 

Dim sTerm As String 

sTerm = Mid(sin, ilnPnt, Len(sTerml)) 

pOk = (sTerm | sTerml) And (sTerm ^ sTerm2) 

If pOk Then 

ilnPnt = ilnPnt + Len(sTerml) 
blnEnd = (ilnPnt > Len(sin)) 
sLastln = sTerm 
End If 

End Function 

Example rule 

pLetter = 'a'..'z' | 'A'..'Z' ; 

Compiled 

Function pLetter() 
cPush 

Call plnterval("a" , "z") 

If Not pOk Then 
cTop 

Call plnterval("A", "Z") 

End If 
cDrop 

End Function 

' pLetter = 'a'..'z' | 'A'..'Z' ; 

The argument is checked if it falls within the two extremes by string comparison symbols and "<=". Result on 

character comparisons will be OK. Result on different length string intervals are allowed but comparison result is as 
predictable as that of the host language. 

Hence, in general: 

Use only single characters or byte values in your interval specifications. 

Unless you understand the host language peculiarities. 


Exception-expression 

Compile time syntax-semantics 

Term = Factor , [ pS , '-' , Exception ]; 

Exception = pOutlndAdd ( 'If pOk Then') 

, pOutlnd ('cPush') 

, pOutlnd ('ilnPnt = ilnPnt - Len(sLastln)') , 

Term , pOutlnd ('cDropExcept') 

, pOutlndSub ('End If'); 

The Exception may be a Term again, typically a terminal or a non-terminal, but also further expressions are allowed. 
Run time semantics 

Function cDropExcept() 

ilnPntMax = Application.Max(ilnPnt, ilnPntMax 
pOk = Not pOk 
If pOk Then 
cTop 
End If 

iContext = iContext - 1 
End Function 

Example rule 

Function consonant() 
cPush 
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ilnPnt 


Len(sLastln) 


letter 
If pOk Then 
cPush 
ilnPnt = 
vowel 
cDropExcept 
End If 
cDrop 

End Function 

' consonant = letter - vowel; 

Clarification of the runtime exception mechanism: 

A Term with exception works as follows. Before evaluating the exception (e.g. the non-terminal vowel), it first backs 
the ilnPnt to the value before last successful parse (e.g. the non-terminal letter). This is simply pointing ilnPnt back 
with the length of the last successful parse, so Len(sLastln). Thereafter parsing is further performed with the 
exception. When successful, it means that the exception was indeed the case, so the Term parsed should result in 
False. When the exception parsed False then Term should result in True. More concisely pOK inverts: pOk is assigned 
Not pOk,. 

In general an unsuccessful parse (parse that delivers pOk=False) will set the ilnpPnt to the spot that caused the invalid 
parse. Also for exception this is true, although the parse of the exception itself was parsed OK. 

Finally sLastln is of a successful parse of a Term, with or without exception statement, is set to the string last parsed 
by Term, not by the exception. A nice example is in the EBNF+ source itself: 

pComment = ' (* ' , { ( ' *) ' |pCharacter)- ' *) ' } , ' *) ' ; 

Parsing a comment is mainly performed by repeated Characters. So character by character. The first '*)' symbol 
checks in the input holds this end symbol for a comment. This actually may need to parse 2 characters, a and a ')'• If 
it is a character or the end symbol it will check in the exception if it was a '*)'. If not so it will repeat parse with { and }. 
If it was the exception the option becomes False and the repeat will stop. The latter '*)' symbol will have the parser 
step over the full comment. 
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Overview of EBNF+ expressions to code 


Compiled syntax 


Syntax rule 

exp! 


Function NTN() 
cPush 
exp 
cDrop 

End Function 


Public aContext(l To 100, 1 To 6) 


Function jcPush () 

iContext = iContext + 1 

If iContext > 100 Then MsgBox "Context stack overflow" 
aContext(iContext, 1) = ilnPnt 
aContext(iContext, 2) = blnEnd 
aContext(iContext, 3) = Len(sOut) 
aContext(iContext, 4) = sLastln 
aContext(iContext, 5) = iNdent 
aContext(iContext, 6) = 0 
End Function 


Function pDropO 

ilnPntMax = Application.Max(ilnPnt, ilnPntMax) 

' In case of syntax error tell the user 

how far the parse got 
If pOk Then 

sLastln = Mid(sin, aContext(iContext, 1), ilnPnt - 
aContext(iContext, 1)) 

' aContext(iContext,1) is temporarily 

stored ilnPnt 
Else_ 

BH 

End If 

iContext = iContext - 1 
End Function 


Function cTop () 

ilnPnt = aContext(iContext, 1) 
blnEnd = aContext(iContext, 2) 
sOut = Left(sOut, aContext(iContext, 
sLastln = aContext(iContext, 4) 
iNdent = aContext(iContext, 5) 

End Function 


3) ) 


Alternative 

NTN = expl|exp2 ; 


Function NTN() 
cPush 
expl 

If Not pOk Then 
cTop 


exp2 

— 

cDrop 

End Function 


NTN = expl|exp2|exp3; 


Function NTN() 
cPush 
expl 


If Not pOk Then 
cTop 


exp2 | 

If Not pOk Then 
cTop 


exp3 

End If 

End If 
cDrop 

End Function 


Function cTop() 

ilnPnt = aContext(iContext, 1) 
blnEnd = aContext(iContext, 2) 
sOut = Left(sOut, aContext(iContext, 
sLastln = aContext(iContext, 4) 
iNdent = aContext(iContext, 5) 

End Function 


3) ) 


Concatenation 

NTN = expl| exp2; 


Function NTN() 
cPush 

expl _ 

If pOk Then 
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exp2 

WH 

cDrop 

End Function 

NTN = expl ,| exp2 |, 

exp3; 

Function NTN() 
cPush 
expl 

If pOk The| 

exp2 

End If 

If pOk Then 

exp3 

BH 

cDrop 

End Function 

[ ] 

Option 

NTN = Jexpjjj; 

Function NTN() 
cPush 

ma 

exp 


pOk = True 
cDrop 

End 

cDrop 

Function 

( ) 

Grouping 

NTN = |exp|; 

Function NTN() 
cPush 

fUai 

exp 

cDrop 

cDrop 

End Function 

NTN = expl| (exp2,exp3); 

Function NTN() 
cPush 
expl 

If Not pOk Then 
cTop 

cPush 

exp2 

If pOk Then 
exp3 

End If 
cDrop 

End If 
cDrop 

End Function 

{ } 

Repetition 

NTN = {exp}; 

Function NTN() 
cPush 

Do 

cPush 

exp 

cDrop 

Loop While pOk And Not blnEnd 

pOk = True 

cDrop 

End Function 

{ }* 

Repetition 

NTN = {exp}*; 

Function NTN() 
cPush 

cPush 

Do 

cPush 

exp 

cDrop 

Loop While pOk And Not blnEnd 

pOk = (aContext(iContext, 1) <> ilnPnt) 

iContext = iContext - 1 

cDrop 

End Function 


Exception 

NTN = expl|exp2; 


Function NtN() 
cPush 
expl 

tf pOk Then 

cPush 

ilnPnt = ilnPnt - Len(sLastln) 


exp2 

cDropExcept 
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End If 
cDrop 

End Function 

+ 

Concord 

NTN = expl | exp2; 

Function NTN() 
cPush 
expl 

If pOk Thej 

cAndTop 

exp2 

jlEndAnd 

End If 
cDrop 

End Function 

Function cAndTop() 

aContext(iContext, 6) = ilnPnt 

ilnPnt = aContext(iContext, 1) 

blnEnd = aContext(iContext, 2) 

sOut = Left(sOut, aContext(iContext, 3)) 

sLastln = aContext(iContext, 4) 

iNdent = aContext(iContext, 5) 

End Function 

f f 

Terminal 

NTN = "terminal"; 

NTN = (terminal|; 

Function NTN() 
cPush 

pin("terminal") 

cDrop 

End Function 

1 ! 1! 

Terminal 

Function NTN() 
cPush 

pin("terminal") 
cDrop 

End Function 



Function pIn(sTerm As String) 

pOk = (Mid(sin, ilnPnt, Len(sTerm)) = sTerm) 

If pOk Then 

ilnPnt = ilnPnt + Len(sTerm) 
blnEnd = (ilnPnt > Len(sin)) 
sLastln = sTerm 

Else 

If ilnPnt >= ilnPntMax Then 
sLastMiss = sTerm 

End If 

End If 

End Function 

* 

Factoring 

NTN = 123*jexp2; 

Function NTN() 
cPush 
vPush(123) 

Do While vTop > 0 
exp2 
^B:vTcH 

Lo|§§i 

vDrop 

cDrop 

End Function 

Public aVS(1 To 100) 

Function |vPush (iPushed As Integer) 

iVSP = iVSP + 1 
aVS(iVSP) = iPushed 

End Function 

Function vTop() As Integer 
vTop = aVS(iVSP) 

End Function 

Function |ecvTop() 

aVS(iVSP) = aVS(iVSP) - 1 

End Function 

Function |vDrop() 

iVSP = iVSP - 1 

End Function 


Terminal as single byte 

NTN = A 13; 

Function NTN() 
cPush 

pIn(Chr (13) )| 

cDrop 

End Function 

r 

Termination 
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Range 

NTN = "A".."Z"; 

Function NTN() 
cPush 

Call plnterval("A","Z") 
cDrop 

End Function 

Function plnterval(sTerml As String, sTerm2 As String) 

Dim sTerm As String 

sTerm = Mid(sln, ilnPnt, Len(sTerml)) 

If sTerm = "" Then 
pOk = False 

Else 

pOk = (Asc(sTerm) >= Asc(sTerml)) And (Asc(sTerm) <= 

Asc(sTerm2)) 

If pOk Then 

ilnPnt = ilnPnt + Len(sTerml) 
blnEnd = (ilnPnt > Len(sin)) 
sLastln = sTerm 

End If 

End If 

End Function 

(* *) 

Comment 

(* This is a comment of 36 characters *) 

Will not be compiled 
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Compiled state machines 


State definition 

NTN # exp ; 


Static Function NTN(Optional R As Boolean) 
Dim iState As Integer 
If R then 



Concatenated states 

NTN # x , y , z ; 



x=T rue 



iState = 0 
pOk = True 
blnEnd = False 
Exit Function 
End if 

If blnEnd then Exit function 
If iSTate = 0 Then 
exp 

If pOk Then iState = 1 
End If 

If iState=l Then iState = 0 
pOk = True 
End Function 

Static Function NTN(Optional R As Boolean) 

Dim iState As Integer 

If R then 

iState = 0 

pOk = True 

blnEnd = False 

Exit Function 

End if 

If blnEnd then Exit function 
If iSTate = 0 Then 

x 


y=True 


I 



Alternative states 

NTN # x | y | z ; 


If pOk Then iState = 1 
End If 

If pOk AND iSTate=l Then 
Y 

If pOK Then iState = 2 
End If 

If pOk AND iSTate=2 Then 
z 

If pOK Then iState = 3 
End If 

If iState=3 Then iState = 0 
pOk = True 
End Function 


Static Function NTN(Optional R As Boolean) 
Dim iState As Integer 
If R then 



x=True y=True z=Tiue 



iState = 0 
pOk = True 
blnEnd = False 
Exit Function 
End if 

If blnEnd then Exit function 
If iSTate = 0 Then 
x 

If pOk Then iState = 1 
End If 

If pOk AND iState = 1 Then iState = 0 
If Not pOk OR iState > 1 Then 
If iSTate = 0 Then 
Y 

If pOk Then iState = 2 
End If 

If pOk AND iState = 2 Then iState = 0 
If Not pOk OR iState > 2 Then 
If iSTate = 0 Then 
z 

If pOk Then iState = 3 
End If 
End If 
End If 

If iState=3 Then iState = 0 
pOk = True 
End Function 
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Some concatenated and 
alternative states 

NTN # x , y | a , b ; 



Static Function NTN(Optional R As Boolean) 
Dim iState As Integer 
If R then 

iState = 0 
pOk = True 
blnEnd = False 
Exit Function 
End if 

If blnEnd then Exit function 
If iSTate = 0 Then 
x 

If pOk Then iState = 1 
End If 

If pOk AND iSTate=l Then 

y 

If pOK Then iState = 2 
End If 

If pOk AND iState = 2Then iState = 0 
If Not pOk OR iState > 2 Then 
If iSTate = 0 Then 
a 

If pOk Then iState = 3 
End If 

If pOk AND iSTate=3 Then 
b 

If pOK Then iState = 4 
End If 
End If 

If iState=4 Then iState = 0 
pOk = True 
End Function 


Compiled semantics in syntax 



« » 

unconditional code 


Function NTN() 
cPush 


NTN = exp MsgBox = 

X 

H- 

m 

V 

exp 

MsgBox = "Hi” 
cDrop 

End Function 

< . . > 

conditional code 


Function NTN() 
cPush 


NTN = exp m MsgBox = 

"Hi"B ; 

exp 

If pOk Then 

MsgBox = "Hi" 

turn 

cDrop 

End Function 
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7. EBNF+ compared to Regular expressions 

EBNF+ application area is wider than that of Regular expressions (RE): 

EBNF+ covers CFGs including Regular grammars 
EBNF+ specifies grammars 

Regular expressions (REs) are famed/notorious in analysing and processing strings patterns. REs are concise, often not 
larger than a text line for even elaborate tasks. They are commonly supported in Unix alike shell tools and are 
encapsulated in many of today's computer languages. 

Since EBNF covers CFGs it also covers Regular grammars. So far, EBNF is only available in bulky packages unfit for 
simple applications. Now new in this arena is EBNF+. It can be added to any computer language lean and mean. Hence 
it may compete with REs and beyond. Let's compare. 

Below 2 examples of solutions in EBNF and RE for the same problem. In the examples host language for EBNF+ is still 
VBA and for RE it is Perl (sorry could not find a RE-VB combination). 


Example 7: Convert user input value in Celsius or Fahrenheit depending on format 

Task in this example is to convert user input data from Fahrenheit to Celsius or vice versa depending on the user 
added unit symbol F or C. For example, user enters "100C" and system response is "212F" or user enters "100F" and 
system response is "36.67C". (Example and its RE solution are from [4]) 

Step la Define rule to check user input 

In EBNF 

Using the EBNF notation, user grammar is: 

pDigit 1 = '0'..'9'; 

plnteger = [ f |pDigit , {pDigit} ; 
pTemp = plnteger, ( 'C’ |'F'); 

To mimic RE style this could be written as an EBNF one-liner: 

pTemp=['-'|' + , ],0'..'9\{'0'..'9'},( , C , | , F'); 

Nevertheless, in general I do not advocate the short form. Grammars easily get complicated. To manage complexity 
one better engineers a structured approach: 

clear and appropriate naming 

abstraction into hierarchic layers. 

With EBNF this is 2nd nature. 

To allow user to input reals, spaces, capital or small letters, the above grammar can be extended: 

pSpaces = {' '} ; 

pDigit = '0' . . ' 9 ' ; 

pNumber = pDigit , {pDigit} ; 

pReal = [' + '|pNumber,[".",pNumber]; 

pTemp = pReal, pSpaces, ('c'|'C'|'f'|'F'); 

Note that there are no new operators introduced to extend the definition, only the operands named in the 
problem requirement specification. 

In RE 

Using RE (the underlined part) in Perl (the rest) user input should confine to the pattern: 

$Temp =~m/ A [-+]?[0-9]+[CF]$ / 

This statement uses both symbols from RE and Perl: 


1 You may wonder what the prefix "p" is always doing in the EBNF+ examples. It is no requirement, but only to 
remind me it denotes something that will parse. Furthermore, the best descriptive names I always come up with 
seem already been taken by the host language. 
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the Perl symbol =~ denotes that it links the Perl variable name $Temp to the RE enclosed between the Perl symbol 
pair m/ and / 

RE symbol pair A and $ ensures that input contains the enclosed character pattern. 

RE symbol pair [ and ]+ requires at least one match of the enclosed character pattern. 

RE symbol pair [ and ]? Signals that the enclosed character pattern is optional 
Conciseness comes with a price. RE tends to be hard to read. 

To allow user to input reals, spaces, capital or small letters, the RE can be extended: 

$Temp =~ m/~([-+] ?[0-9]+(\.[0-9]+)?) \s+([CF])$/i 

Note that for almost every new requirement there is a new operator symbol! Some from RE, some from Perl. This 
makes learning RE as fast as Chinese. 

Step lb Define the semantics (here in EBNF+ with VBA) 

«Dim dTemp as Single>> 

«Dim sTemp as String>> 


pDigit = ' 0 ' . . ' 9 ' ; 
plnteger = pDigit , {pDigit} ; 

pFloat = [" + "|»-»] f plnteger , [ , plnteger ] ; 

pTemp = pFloat <.dTemp = cDec(sLastln) .> 

, ("C" <.sTemp = cStr((dTemp*9/5)+32)& " F" .> 

|"F" <.sTemp = cStr((dTemp-32)*5/9)& " C" .> 

) ; 

<< 

Function ConvertTemp 
plnit 

sin = InputBox ("Enter a temperature (e.g., 32F, 100C):") 
pTemp 

If pOK Then 

MsgBox sin & " equates to " & sTemp 
Else 

MsgBox "Sorry, don't understand: " & sin 
End if 

End Function 

» 

Now specify that the form converts to Celsius and Fahrenheit depending if input is resp. Fahrenheit and Celsius. 
Note that the EBNF+ <... .> constructs specify related actions. 

Note the different way of parameter passing as compared to RE: 

EBNF+ only uses the String variable sLastln. It contains string parsed by the last parsed rule. 

RE uses $1, $2 etc. These contain the values in respective ()-clauses parsed in the last RE. 


Step lb Define the semantics (here with RE in Perl) 


print "Enter a temperature (e.g., 32F, 100C):\n"; 
$input = <STDIN>; 
chomp($input) ; 

if ($input =~ m/~ ([-+]?[0-9] + (\. [0-9]+)?) +([CF])$/) 
{$InputNum = $1; 

$type = $2; 
if ($type eq "C") 

{$celsius = $InputNum; 

$fahrenheit = ($celsius+9/5) + 2;} 
else 

{$fahrenheit = $InputNum; 

$celsius = ($fahrenheit-32)+5/9;} 

printf "%.2f C is %.2f F\n",$celsius,$fahrenheit;} 
else 

{print "Sorry, don't understand: \"$input\".\n";} 


# Read one line from the user 

# Remove the ending newline from $input. 

# If we get in here, we had a match 

# $1 is the number, $2 is "C" or "F" 

# Save to named variables to make the ... 

# ... rest of the program easier to read. 

# 'eq' tests if two strings are equal 

# The input was Celsius, so calculate 
Fahrenheit 

# If not "C", it must be an "F", so 
calculate Celsius 

# At this point we have both temperatures, 
so disp 

# The initial regex did not match, so 
issue a warning. 
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Example 8: Prepare response on a received mail 

(Example and RE solution from [3, page 53].) 

Imagine we receive from Elvis Fri Feb 29 11:15 2002 (Emailln.txt): 

Received: from elvis@localhost by tabloid.org (8.11.3) id KA8CMY 
Received: from tabloid.org by gateway.net (8.12.5/2) id N8XBK 
To: jfriedl@regex.info (Jeffrey Friedl) 

From: elvis@tabloid.org (The King) 

Date: Fri, Feb 29 2002 11:15 

Message-Id: <2002022939939.KA8CMY0tabloid.org> 

Subject: Be seein' ya around 
Reply-To: elvis@hh.tabloid.org 

X-Mailer: Madam Zelda's Psychic Orb [version 3.7 PL92] 

Sorry I haven't been around lately. A few years back I checked 
into that ole heartbreak hotel in the sky, ifyaknowwhatlmean. 

The Duke says "hi". 

Elvis 

To start our reply, we need a file from above prepared (EmailOut.txt), to contain: 

To: elvis@hh.tabloid.org (The King) 

From: jfriedl@regex.info (Jeffrey Friedl) 

Subject: Re: Be seein' ya around 

On Fri, Feb 29 2002 11:15 The King wrote: 

;> Sorry I haven't been around lately. A few years back I checked 
;> into that ole heartbreak hotel in the sky, ifyaknowwhatlmean. 

;> The Duke says "hi". 

;> Elvis 


EBNF+ solution to prepare response on a received mail 

« 





Dim sFromAdd 

As String 



Declare the help 

Dim sFromName As String 



variables 

Dim sDate As 

String 




Dim sSubject 

As String 




Dim sReply As String 




Dim sMess As 

String 




» 





pMail = pFirstLine , {pHeader} 

, pMessage ; 


Define the grammar 





of the email- 

pFirstLine 

: pLine , pEOL ; 



structure. Drill top 





down towards the 

pHeader = ( 




individual header 

"Received: 

" , pLine 


1 

lines. Then add the 

"To: " 

, pLine 


| 

semantic actions to 

"From: " 

, pWord 

<.sFromAdd 

= sLastIn.> 

capture the header 


f (" 

,pText <.sFromName = sLastIn.> 

fields in the help 



,")",pLine 

1 

variables. 

"Date: " 

, pLine 

<.sDate = 

sLastIn.> | 


"Message-Id: " , pLine 


1 


"Subject: ’ 

' , pLine 

<.sSubj ect 

= sLastIn.> | 


"Reply-To: 

" , pLine 

<.sReply = 

sLastIn.> | 


"X-Mailer: 

" , pLine 




) 

, pEOL; 




pMessage = { 

pLine 

<.sMess = sMess & "|> " & 

sLastln & vbCrLf.> 



, pEOL } ; 




pLine = {* 

‘32 . . A 127 } ; 




pWord = { A 

'33. . A 127}; 




pText = {* 

'32.. A 127-")"}; 




pEOL = { A 

5-1 

CO 

< 

o 








Now generate new 

« 




text while using the 

Function GenerateMail 



captured fields from 

sMess = "" 




the Emailln.txt. 

If ParseString(Fileln("Emailln.txt"),"pMail") Then 



sOut = 

"To: " & sFromAdd 

& vbCrLf 



& 

"From: " & sFromAdd & " (" & sFromName & ")" 

& vbCrLf 


& 

"Subject: Re: " & 

sSubject & vbCrLf 



& 

"On " & sDate & sFromName & " wrote:" & vbCrLf 
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& sMess 

Here we also 

Else 

generated the 

sOut= sOut & " something went wrong" 

response file, 

End If 

including error 

Call FileOut ("EmailOut.txt", sOut) 

handling. 

End Function 


» 



RE-Perl solution to prepare response on a received mail [3] 


Go through all lines and capture the 
required fields into the variables. 


while ($line = < >) 

{ 

if ($line =~ m/~\s+$/ ) { 

last; 

} 

if ($line =~ m/~Subject: (.,)/i) { 

$subject = $1; 

} 

if ($line = ~ m/~Date: (.,)/i) { 

$date = $1; 

} 

if ($line = ~ m/'"Reply-To: (\S+)/i) { 

$replyRaddress = $1; 

} 

if ($line =~ m/ A From: (\S+) \ ( ([~()],)\)/i) { 

$replyRaddress = $1; 

$fromRname = $2; 

} 

} 

print "To: $replyRaddress ($fromRname)\n"; 
print "From: Jeffrey Friedl 
<j friedl\@regex.info>\n"; 
print "Subject: Re: $subject\n"; 
print "\n" ; 

print "On $date $fromRname wrote:\n"; 

while ($line = < >) { 

print ";> $line"; 

} _ 


check for the header-ending blank line 
with the expression !"\s+$, If so drop out 
of the while loop. 


Generate the output file using the 
captured fields. 


# blank line to separate the header from 
message body. 


Problem vs Solution 

Although we see in both solutions similar constructions, the development approach differs basically. EBNF forces the 
programmer to see and specify the input as a production of some grammar. The input as a whole is then required to 
be subdivided and further structured. We see in above EBNF examples main structures as: 

pMail= pFirstLine, {pHeader}, pMessage; 

Such view signifies a top level design approach. Since RE is line oriented, structured approach is not in scope, let alone 
a further refinement and separation of syntax and semantics. 

To search for specific headers RE solution uses a while statement to go through the text. In EBNF the text is seen as a 
sequences of lines with a header. Sequences are basic to grammars also for big data and hence supported by the {,} 
and }+ EBNF symbols. In RE the examples do not cope with structures on line level. Instead a Perl While control 
statement is used with a separate statement to detect the end of the headers and even a separate statement to 
escape out of the loop. Not really elegant. 

EBNF challenges the user to find the grammar. Once determined the user has full control over the content and can 
with ease juggle around with structured pieces of this content. The approach of RE is typically bottom up style 
programming: go one by one through the lines and match on keywords and other markings to see if data should be 
modified or captured for later use. 

In short we can say that in the RE solution the syntax check is performed both by RE and Perl. In EBNF there is a very 
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strong separation of syntax and semantics. Every character in the source is checked by statements in EBNF. The 
grammar rules in the EBNF+ specification step through the source. Not the statements in the host language step 
through this source. 


Sparse vs Rich 

In the RE solution pattern check is done by a combination of RE and Perl symbols, forming a rich set of non-descriptive 
symbols, which can be overwhelming to non-native Perl programmers that casual need to parse. 

Pattern check in EBNF+ is in EBNF only; a mere 12 meta symbols do the job. This may make an EBNF specification 
lengthier, but certainly easier to read, develop and maintain. 


Input versus line orientation 

RE treats input as human text with standard commands to cope with new lines, spaces and upper and lower case 
characters. EBNF+ treats input as any sequence of bytes with no preference if from machines or humans. 


Conclusion of the RE-EBNF+ comparison 

I have no experience with RE, but the example solutions in the RE bibles[4] [5] in my view really look like magic 
compared to the simple and clear recipes of EBNF+. Nothing wrong with wizards, but I would not let them build my 
house. 

I challenge all RE advocates to really try one non trivial task in EBNF+ and I bet you will change your mind or at least 
stretch it. 
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8. Examples 


Find strings (T022) 


Example 9: Mark all strings that match sSearch in a text file sin 


If input sin looks like 

Then output sOut should be 

Some match in this line. 

Ended with a match. 

Appended with some matches. 

Prefixed with prematch. 

Embedded in postmatches. 

Some !match! in this line. 

Ended with a !match!. 

Appended with some !match!es. 

Prefixed with pre!match!. 

Embedded in post!match!es. 

The solution syntax and semantics for above requirement 

Char = A 0 .. A 255 ; 

sSearch = "match"; 

pStart = { sSearch <. sOut2 = sOut2 & "!" & sLastln & "!".> 

| Char <. sOut2 = sOut2 & sLastln.> 

}; 

pStart is start symbol. 

Make use of EBNF+ property of 
alternative Char is only parsed 
if sSearch is not successful. 


Find words (T023) 


Example 10: Mark all strings that match sSearch as word in a text file sin 


If input sin looks like 

Then output sOut should be 

Some match in this line. 

Ended with a match. 

Appended with some matches. 

Prefixed with prematch. 

Embedded in postmatches. 

Some !match! in this line. 

Ended with a !match!. 

Appended with some matches. 

Prefixed with prematch. 

Embedded in postmatches. 


Solution (T023a) 


pStart is start symbol. 

sSearch applies only when its 
input is a Word AND (+) matches 
the searched string "match". 

Make use of EBNF+ property of 
alternative Char is only parsed 
if sSearch is not successful. 
pStart is start symbol. Here it 
finds a match for sSearch or 
parses the text as one word or 
seen as one character. 

Alternative solution (T023b) 

Char = A 0 .. A 255; 

Letter = A 48.. A 122; 

Word = {Letter}*; 

sSearch = Word + "match”; 

pStart 

{ sSearch <. sOut2 

| Word <. sOut2 

| Char <. sOut2 

}; _ 


= sOut2 & "!" & sLastln & 
= sOut2 & sLastln .> 

= sOut2 & sLastln .> 


CtrlChar = A 0 .. A 56; 
Char = A 0 .. A 255; 


Word = {Char - CtrlChar }*; 


sSearch = Word 

+ 

"match"; 





pStart = 

{ sSearch 

<. 

sOut2 = 

sOut2 

& 

"!" & sLastln & "! 

" . > 

| Word 

<. 

sOut2 = 

sOut2 

& 

sLastln .> 


| Char 

<. 

sOut2 = 

sOut2 

& 

sLastln .> 



}; 


Correlate strings (T24] 

To find strings that look alike one can use a simple correlation method. Compare all characters in sKey against 
the string in text starting at text position 1, 2 etc. For each sKey comparison report the how many characters 
match, that is, report the weight of match. For a total match the weight is the length of sKey. For a total 
mismatch the weight is zero. 
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Example 11: match 10 digit numbers with correlation is 10 or 9 digits 


Solution (T024) 


pChar = A 0.. A 255 ; 

pDigit = ' 0 ' . . ' 9 ' ; 

The rule to parse a full key even when some characters 
do not match is: 

pStart = <<W=0>> 

10* (pDigit | pChar ) 

{ ( 10* (pDigit <.W=W+1.> 

| pChar 

where pChar guarantees that al 10 characters in the 

) 

) - pChar 

input will be checked. 

IpChar <.EnlistW.> 

} ; 

Embedding this rule to check at all character position is: 

pStart = | ( 10* (pDigit | Mil ) ) - pChar 

« 

| pChar 

Dim W as Integer 

I 

Dim Wold as Integer 



Some explanation for this tricky rule: 

Sub EnlistW 


If W>7 AND W<Wold then 

the first pChar parses the character if it is not a digit, in 

sout2 = sout2 & sLastln & "<" & cStr(Wold) 

& ">" 

effect parse will be always successful. 

Else 

Let us name a single check of rule 10* (pDigit | pChar as 

sOut2 = sOut2 & sLastln 

End if 

a "round'. Then, 

Wold = W 


W = Ox 

the second pChar always triggers that first part is 

End Sub 

always not conform grammar that is so far checked, in 

» 

effect setting the parse back to the first character 
checked in pDigit in this round. 

the last pChar always renders true, with result that the 
first character checked by pDigit in the next round 
starts one digit further. 

Weight counter is denoted by W and the previous 
weight by Wold. In this example weights dropping after 

10 and 9 weight are reported in the output only. 

Example Input 

Resulting Output 

This sample text has some numbers like 

This sample text has some numbers like 

1234567890 that will match exactly. 

12<10>3<9>4567890 that will match exactly. 

But some less exact numbers can maybe also be 

But some less exact numbers can maybe also be 

found such as 

found such as 

inc 1 letter 12 34567890, 

inc 1 letter 12 <9>34567890, 

1 digit less 123457890, 

1 digit less 12<9>3457890, 

Of course random numbers too: 465663a564712, 

Of course random numbers too: 46566<9>3a564712, 

4 65663ab564712. 

4 65 663ab5 64 712. 

But some number carry less weight 123g456h789. 

But some number carry less weight 123g456h789. 


Progressive calculator examples 

A popular example in parser world is the calculator. Task is to simulate an electronic calculator. You key in an 
arithmetic expression and the system displays you the answer. Let us try this in EBNF+ by stepwise improvement. We 
start with the common dyadic operators without precedence, then with precedence, then we allow clauses and finally 
add a monadic operator for the fun of education. 


Example 12: A simple calculator 

Below a simple calculator. It will execute by the operators in the order of their parse from left to right soon. So 1+2*3 
will render 9. 

Step 1: Specify syntax and semantics _ 
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«Dim dResult as Single>> 

pDigit = "0".."9"; 
plnteger = {pDigit}*; 

pFloat = plnteger , [".", plnteger]; 

pExpr = pFloat <.dResult = cDec(sLastln).> 

, { " + " , pFloat <.dResult = dResult + cDec(sLastln).> 

| , pFloat <.dResult = dResult - cDec(sLastln).> 

| "*" , pFloat <.dResult = dResult * cDec(sLastln).> 

| "/" , pFloat <.dResult = dResult / cDec(sLastln).> 

} ; 

«Sub Calculator () 
dResult = 0.0 
Do 

InitParser (InputBox("Enter formula: ","Calculator example using EBNF+")) 
pExpr 

If pOk And blnEnd Then 

MsgBox "Result of " & sLastln & " is: " & cStr(dResult) 

Else 

MsgBox "Ended." 

Exit Do 
End If 

Loop 
End Sub 
» 

In yellow the heart of the solution: the syntax and semantic interface. 

Rest forms the interface with the user. Here it will prompt to enter an expression. If the pExpr can be parse the 
entered expression then its value is presented and loop again. If it cannot parse the expression then the loop is 
ended. 

In the further examples we skip this user interface since it will all be the same except for the start symbol. 

Start symbol here is pExpr. _ 


Step 2 ; Compile and load into your application 


Step 3 Execute 


Request: 


Response: 



Execute the example by calling the Subroutine named Calculator. 


Example 13: An improved calculator with operator precedence 

Step la define syntax and semantics (Here for 2 level precedence operators +, *, /) 


<< 

Dim dResult (1 to 100) 

Dim dSP As Integer 

Function dPush(dIn as Single) 
dSP = dSP +1 
dResult (dSP) = din 
End Function 

Function dPop As Single 
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dPop = dResult (dSP) 
dSP = dSP -1 
End Function 
» 


pDigit = "0" . . "9"; 
plnteger = {pDigit}*; 
pFloat = ["+"1"-"], plnteger 


plnteger]; 


pExprMul 


pFloat <.dPush(cDec(sLastln)).> 

{"*", pFloat <.dPush(dPop*cDec(sLastln)).> 

| "/", pFloat] <. dPush (dPop / cDec (sLastln) ). > 


} ; 

)ExprAddj = pExprMul 

<.dPush(dPop + dPop).> 

<.dPush(-dPop + dPop).> 

} ; 


{ "+" 


pExprAdd 
'ExprAdd 


« 

Sub Calculator() 
dSP = 0 
Do 

InitParser (InputBox("Enter formula: ","Calculator example using EBNF+")) 
pExprAdd 

If pOk And blnEnd Then 

MsgBox "Result of " & sLastln & " is: " & dPop 
Else 

MsgBox "Ended." 

Exit Do 
End If 
Loop 
End Sub 
» 

We can stepwise build up the syntax from grammars that have no operator precedence to ones that have multiple 
levels of precedence. 

Let o be an operator definition, f a number and a, b, c expressions, we then can stepwise add operators with higher 
precedence as following. 

No precedence: 

a = | , { o , | } ; 

Precedence ol > precedence o2: 

a = | , { ol , I ; 

| = a , { o2 , |} ; 

Operators of equal precedence (e.g. ol & o2 and o3&o4) can be simply defined using the alternative construction: 


1 ' 

{ ol , 

f 


1 02 , 

f 

a , 

{ o3 , 

b 


1 o4 , | 

b 


In the left column, this progression is worked out for 2 presedence levels. 

Now we solved the syntax. Now what about semantics? 

Evaluating expressions with multiple levels of precedence can be solved with a stack data structure. Parsed numbers 
will simply be pushed onto the stack, a dyadic operator (operator that has 2 arguments) will take the two top items of 
the stack and push the result of the operation onto the stack. When the expression is fully parsed only one element is 
on the stack: the result. We pop from this stack to display it to the user. 

And the stack is empty again. 

Start symbol is pExprAdd! 


Above support of precedence not only concerns arithmetic operators it of course holds for all operators. A nice 
example, is EBNF itself. Here the alternative (" | "-symbol) has lower precedence then the concatenation ("/'-symbol). 
Check the rules and it shows you the same structure of rules as our 2 level precedence form: 

a = 1 , { ol , | ; 

1 = a , { o2 , 1} ; 
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For the EBNF grammar (simplified a bit): 

SingleDefinition = : Tert§| , { 'Perm}; 


DefinitionList = SingledDefinition { "|" 

So no innovative thinking here. 

, DefinitionList; 


Step lb define syntax and semantics (Here for 3 level precedence operators +, *, /, A ) 

pDigit = "0".."9"; 
plnteger = {pDigit}*; 

pFloat = | f plnteger , plnteger]; 


pExprPow = pFloat <.dPush(cDec(sLastln)).> 

, { " A " , [Float] <.dPush(dPop A cDec(sLastln)).> 

} ; 

pExprMul = pExprPow 

, { "*" , pExprMul <.dPush(dPop * dPop).> 

| "/" r pExprMul <.dPush(1/dPop * dPop).> 

} ; 


pExprAdd = pExprMul 

, { "+" , pExprAdd <.dPush(dPop + dPop).> 

| , pExprAdd <.dPush(-dPop + dPop).> 

} 


Left the syntax/semantics for 3 level precedence. Stack and the Calculator routine is as above. 

Pattern is as follows: 

a = | , { ol , | ; 

1 = a , { o2 , 1} ; 

| » | , { o3 , ; 


Compare this to the 2 level precedence case, and see the pattern progression. 
Start symbol here is pExprAdd again. 


Step 2 Compile the code 


Step 3 Start the calculator example 

Microsoft Excel ^3 


Wow it really works! 

But more important, it has short and very perceptive code. 


Result of 1+2*3 A 4/5-1 is: 32.4 



Example 14: Calculator supporting parentheses 

Step lb define syntax and semantics for adding the grouped clause: allow sub-expression inside "(" and ")" 
symbols 

pDigit = "0".."9"; 

plnteger = {pDigit}*; 

pFloat = ["+"|"-"], plnteger , 

[" . " , plnteger]; 


pNumber = pFloat 

<.dPush(cDec(sLastln) ) . > 


| pExprGrp; 


pExprPow = pNumber 

, { ,,An , pNumber 

} ; 

< .dSwap. > 

<.dPush(dPop A dPop).> 
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pExprMul = pExprPow 

, { , pExprMul <.dPush(dPop * dPop) .> 

| "/" , pExprMul <.dSwap.> 

<.dPush(dPop / dPop).> 

} ; 


pExprAdd = pExprMul 

, { "+" , pExprAdd <.dPush(dPop + dPop).> 

| , pExprAdd <.dPush(-dPop + dPop).> 

} ; 


pExprGrp = "(" , pExprAdd , ")" ; 

Now how to tackle groupies? The approach is as follows. In the previous example the parse in the end 
actually acts on the input string here named Float. Introducing clauses is actual adding an alternative to Float. 
Instead of a Float you may parse an expression enclosed between brackets. So for every Float we replace with 
Float OR a grouped clause. This combination is named pNumber, not to clubber up the expression of the 
operator rule with highest precedence. 

In the left column that is all we have to do as syntax change. 

A nice thing is that the semantics almost stays the same. Since also the added alternative expression 


pExprGrp will push a new number on the stack, the same is done by the Float. 


Start symbol is again pExprAdd! 


Step 2 Compile and 3 Execute 


Calculator example using EBMF+ ^9 

Microsoft Excel 


Enter formula:: 

Cancel 

Result of [T A [4*5/10) is 15 

|(1-(2+3)H4-5/10y 

OK 


Not surprised now that it works©. 


Example 15: Calculator with added monadic operators 


Step lc define syntax and semantics for adding the grouped clause: allow sub-expression inside "(" and ")" 
symbols 

pDigit = "0".."9"; 
plnteger = {pDigit}*; 

pFloat = ["+"|, plnteger , plnteger]; 


pNumber = pFloat <.dPush(cDec(sLastln)).> 

| pExprGrp 

| pExprMon; 


pExprPow 


pNumber 


pNumber <.dSwap.> 
<.dPush(dPop A 


dPop).> 


pExprMul = pExprPow 


pExprMul <.dPush(dPop * dPop).> 

| pExprMul <.dSwap.> 

<.dPush(dPop / dPop).> 


pExprAdd = pExprMul 

, { "+" , pExprAdd <.dPush(dPop + dPop).> 

| , pExprAdd <.dPush(-dPop + dPop).> 

} ; 

pExprGrp = "(" , pExprAdd , ")" ; 

pExprMon = "Sqr" , pExprPow <.dPush(Sqr(dPop)).>; 

Start symbol is again pExprAdd! 
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Step 2 Compile 3 Execute 






Example 16: Infix to RPN convertor 

In the above calculator example we interpreted the keyed expression and presented its evaluated result. The 
evaluation is thereby performed in post fix operator manner: first the operands are stored and then their 
operator. So the infix expression 1*2+3 A 4 becomes 12*34 A + in postfix. The latter form is commonly named 
RPN, Reversed Polish Notation. Our computer chips work in post fix order. Compilers that deliver assembly or 
machine code have a task is to convert the source arithmetic expressions into RPN. Below an EBNF+ solution for 
this conversion. 


The calculator solution executes in post fix order. Instead of execution, for the convertor solution we have to 
deliver code for execution. So we can just replace all operator calculation statements into code producing 
statements. E.g. dPush( dPop / dPop) becomes pOut ("DIV"), where DIV is a imagined operator name for an 


imaginary RPN machine. 


Step 1 define syntax and semantics 

pDigit = "0".."9"; 
plnteger = {pDigit}*; 

pFloat = ["+"1"-"], plnteger , plnteger]; 

pNumber = pFloat <.pOut(sLastln & " "). > 

| pExprGrp 
| pExprMon; 


pExprPow 

= pNumber 


, { ,,An 

} ; 

, pNumber 

pExprMul 

= pExprPow 


, { "*" 

, pExprMul 


1 "/" 

} ; 

, pExprMul 

pExprAdd 

= pExprMul 


, { " + " 

, pExprAdd 


| «_« 

} ; 

, pExprAdd 

pExprGrp 

= "(" , 

pExprAdd , 

pExprMon 

= "Sqr" 

, pExprPow 


C.pOut(" EXP").> 


<.pOut(" MUL").> 
<.pOut(" DIV").> 


<.pOut(" ADD").> 
<.pOut(" SUB").> 


<.pOut(" SQR").>; 


« 

Sub Calculator() 
Do 
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InitParser (InputBox("Enter formula: ","Infix to RPN convertor example using EBNF+")) 

sOut = "" 
pExprAdd 

If pOk And blnEnd Then 

MsgBox "RPN notation of " & sLastln & " is: " & sOut 
Else 

MsgBox "Ended." 

Exit Do 
End If 

Loop 
End Sub 
» 

Step 2 Compile & load module _ 

Step 3 Execute _ 

User request: 



System response: 
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That was all rather trivial, wasn’t it? 
So start using it ©©©! 
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